perm filename NSF88.TEX[PRO,JMC] blob
sn#873735 filedate 1989-06-30 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00029 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00005 00002 %nsf[w88,jmc] 1988 NSF Basic Research in AI Proposal
C00013 00003 \title{PROPOSAL FOR BASIC RESEARCH IN ARTIFICIAL INTELLIGENCE}
C00021 00004 \section{Recent Results on Nonmonotonic Reasoning}
C00024 00005 \subsection{Relation Between Nonmonotonic Formalisms}
C00029 00006 \subsection{Relation of Nonmonotonic Formalisms to Logic Programming}
C00031 00007 \subsection{Automation of Nonmonotonic Reasoning}
C00033 00008 \subsection{Inheritance Systems}
C00035 00009 \subsection{Diagnostic Reasoning}
C00036 00010 \subsection{Reasoning about Action}
C00040 00011 \section{Proposed Work on Nonmonotonic Reasoning}
C00049 00012 \subsection{Ramification Problem}
C00052 00013 \subsection{Creating and Destroying Objects}
C00054 00014 \subsection{More General Ontologies of Time and Action}
C00058 00015 \subsection{Strategies}
C00064 00016 \section{Proposed Work on Other Aspects of the Logic Approach}
C00076 00017 \subsection{Mental Situations}
C00079 00018 \subsection{General-Purpose Database of Common Sense}
C00082 00019 \subsection{Heuristics}
C00090 00020 \section{\bf Personnel}
C00115 00021 \subsection{\bf Biography of Vladimir Lifschitz}
C00130 00022 \centerline{\bf Appendix A. AI Needs More Emphasis on Basic Research}
C00145 00023 \centerline{\bf Appendix B. Mathematical Logic in Artificial Intelligence}
C00174 00024 \centerline{\bf Appendix C. Applications of Circumscription}
C00253 00025 \centerline{\bf Appendix D. Formal Theories of Action}
C00316 00026 \centerline{\bf Appendix E. Miracles in Formal Theories of Action}
C00364 00027 \centerline{\bf References}
C00380 00028 \vfill\eject\end
C00388 00029 \nopagenumbers
C00399 ENDMK
C⊗;
%nsf[w88,jmc] 1988 NSF Basic Research in AI Proposal
%\input memo.tex[let,jmc]
\magnification\magstep1
\parskip 6pt
\hsize 6true in
\vsize 8.5true in
\hoffset .25true in
% date and time
\newcount\hours
\newcount\minutes
\newcount\temp
\newtoks\ampm
% set the real time of day
\def\setdaytime{%
\temp\time
\divide\temp 60
\hours\temp
\multiply\temp -60
\minutes\time
\advance\minutes \temp
\ifnum\hours =12 \ampm={p.m.} \else
\ifnum\hours >12 \advance\hours -12 \ampm={p.m.} \else \ampm={a.m.} \fi \fi
}
\setdaytime
\def\theMonth{\relax
\ifcase\month\or
January\or February\or March\or April\or May\or June\or
July\or August\or September\or October\or November\or December\fi}
\def\theTime{\the\hours:\ifnum \minutes < 10 0\fi \the\minutes\ \the\ampm}
\def\cltdate{\rm \theMonth\space\number\day\space\number\year}
\def\jmcdate{\rm \number\year\space\theMonth\space\number\day}
\def\today{\cltdate}
\def\hcr{\hidewidth\cr}
\newcount\scount
\newcount\ecount
\def\leql#1{
\global\advance\ecount 1
\expandafter\xdef\csname eqlab#1\endcsname{\number\ecount}
\leqno(\rm\the\ecount)}
\def\eqref#1{\csname eqlab#1\endcsname}
\def\by{\bigskip\centerline{\bf by John McCarthy, Stanford University}}
\def\title#1{\halign{\centerline{\bf ##}\cr#1\crcr}\by}
%Sections - allocating and defaults
% section numerals are arabic \sectNum
% default \sectFont is \tenbf
% sets mark so that
% \expandafter\iffalse\botmark\fi → \sectName
% \iftrue\botmark\fi → \sectNum
\newcount\sectCounter
\def\sectNumeral#1{\number#1}
\def\sectNum{\number\sectCounter} % current section
\def\sectPre#1{\ifnum #1=0 \else\number #1.\fi}
\newtoks\sectName
\newskip\sectPreSkip \sectPreSkip=21 pt plus 5pt minus 5pt
\newskip\sectPostSkip \sectPostSkip=7 pt plus 3pt minus 2pt
\def\sectBreak{\par
\ifdim\lastskip<\sectPreSkip%
\removelastskip\penalty-300\vskip\sectPreSkip
\else\penalty-300\fi}
\def\sectMark{\mark{\sectNum\noexpand\else \the\sectName}}
\def\sectResetList{\\{\ssectCounter}\\{\sssectCounter}}
\def\sectFont{\tenbf}
\def\sectHeading{\leftline{\sectFont \sectNum.\quad \the\sectName}}
\def\section#1{%
\sectBreak
\global\sectName={#1}
\global\advance\sectCounter1
% \resetCounters{\sectResetList}
\global\ssectCounter=0 \global\sssectCounter=0
% \sectTocEntry
\sectHeading % the first text in section
\sectMark
\nobreak
\vskip\sectPostSkip
}
%subSections
\newcount\ssectCounter
\def\ssectNumeral#1{\number#1}
\def\ssectNum{% of current section
\sectPre{\sectCounter}\ssectNumeral{\ssectCounter}}
\def\ssectPre#1{\ifnum #1=0 \else\ssectNumeral{#1}.\fi}
\newtoks\ssectName
\newskip\ssectPreSkip \ssectPreSkip=14 pt plus 5pt minus 3pt
\newskip\ssectPostSkip \ssectPostSkip=3 pt plus 2pt minus 1pt
\def\ssectBreak{\par
\ifdim\lastskip<\ssectPreSkip%
\removelastskip\penalty-100\vskip\ssectPreSkip
\else\penalty-100\fi}
\def\ssectFont{\tenbf}
\def\ssectHeading{\leftline{\ssectFont \ssectNum.\quad \the\ssectName}}
\def\subsection#1{%
\ssectBreak
\global\advance\ssectCounter1
\global\ssectName{#1}
% \resetCounters{\ssectResetList}
\global\sssectCounter=0
% \ssectTocEntry
\ssectHeading % the first text in ssection
\nobreak
\vskip\ssectPostSkip
}
%subsubSections
\newcount\sssectCounter
\def\sssectNumeral#1{\number#1}
\def\sssectPre#1{\ifnum #1=0 \else\sssectNumeral{#1}.\fi}
\def\sssectNum{% of current sssection
\sectPre{\sectCounter}\ssectPre{\ssectCounter}%
\sssectNumeral{\sssectCounter}}
\newtoks\sssectName
\newskip\sssectPreSkip \sssectPreSkip=7 pt plus 5pt minus 3pt
\newskip\sssectPostSkip \sssectPostSkip=0 pt plus 2pt minus 1pt
\def\sssectBreak{\par
\ifdim\lastskip<\sssectPreSkip%
\removelastskip\penalty-100\vskip\sssectPreSkip
\else\penalty-100\fi}
\def\sssectFont{\tenbf}
\def\sssectHeading{\leftline{\sssectFont \sssectNum.\quad \the\sssectName}}
\def\subsubsection#1{%
\sssectBreak
\global\advance\sssectCounter1
\global\sssectName{#1}
% \sssectTocEntry
\sssectHeading % the first text in sssection
\nobreak
\vskip\sssectPostSkip
}
\overfullrule =0pt
\def\beginlsverbatim{
\begingroup
\parindent0pt
\parskip 0pt plus 1pt
\def\par{\leavevmode\endgraf}
\obeylines
\obeyspaces
}
\def\endlsverbatim{\endgroup}
\settabs 16 \columns
\title{PROPOSAL FOR BASIC RESEARCH IN ARTIFICIAL INTELLIGENCE}
\section{Introduction}
This is a proposal for renewed support of basic research in
artificial intelligence by John McCarthy and students. We propose to add
Vladimir Lifschitz, a Senior Research Associate, to the proposal starting
in July 1989.
Most of the work is in the epistemological part of AI. We will mainly
concentrate on developing formal methods of {\it nonmonotonic
reasoning} and using them to express commonsense knowledge and reasoning.
Other epistemological problems that we are planning to investigate
include formalizing {\it contexts} and {\it
mental situations}. We hope that this work will make it possible to develop a
{\it general-purpose database of common sense facts}.
Also McCarthy has recently returned to the subject of {\it heuristics} and
their formal expression, and work will continue in this direction, depending
on how successful it is.
The role of basic research in AI is discussed in McCarthy's
AI Magazine article (McCarthy 1983),
included with this proposal as Appendix A.
More specifically, the proposed work represents a line of research that
started in the late 1950s and is based on the use of mathematical logic;
a nonmathematical (enforced by the editor)
discussion of the logic approach to AI can be found in the paper
(McCarthy 1988a), an excerpt from which is included as Appendix B.
We will first summarize the results of recent work on
nonmonotonic reasoning and its applications,
and then discuss the problems we propose to study in the next three years.
\section{Recent Results on Nonmonotonic Reasoning}
(McCarthy 1977) contains the first published work on formalized
nonmonotonic reasoning, introducing a version of circumscription that
is now mostly obsolete. A conference on nonmonotonic reasoning was
held at Stanford in 1978, and {\it Artificial Intelligence} devoted
a special issue to the subject in 1980, including (McCarthy 1980),
(McDermott and Doyle 1980) and (Reiter 1980). The first paper introduced
{\it circumscription}, the second introduced a now obsolete
{\it nonmonotonic logic}, and the third introduced a {\it logic of defaults}.
Autoepistemic logic (Moore 1985) is a newer significant nonmonotonic
formalism. In (McCarthy 1986), included with this proposal as Appendix C,
circumscription was further developed, and a formalism of abnormality predicates
was introduced.
These papers started an active branch of research that has resulted
already in several hundred papers;
four of the AAAI-87 and IJCAI-87 best paper awards were given to papers
on nonmonotonic reasoning. The 1984 AAAI sponsored conference
in New Paltz and the 1988 international conference
in Grassau, West Germany were important landmarks in the field.
Recent work has given us much better understanding of mathematical properties of
nonmonotonic formalisms and of their potential as tools for formalizing
commonsense knowledge and reasoning.
\end
\subsection{Relation Between Nonmonotonic Formalisms}
Reiter's default logic and Moore's autoepistemic logic
are similar to each other in the sense that, semantically,
each is based on a ``fixpoint''
construction. Konolige (1987) proved that these formalisms are
essentially equivalent.
Unlike these two systems, circumscription isn't really a new logic; it is
merely a syntactic transformation of formulas that
allows us to formalize nonmonotonic reasoning in classical logic. (This
transformation has a simple model-theoretic meaning: A model of
circumscription is a ``minimal'' model of the axioms.) However, the
reformulation of autoepistemic logic proposed in (Levesque 1987) shows that
this difference is perhaps not so fundamental. Levesque defines the semantics of
autoepistemic logic by translating it into a modal logic with
the ``all I know'' operator. This modal logic is monotonic and has a Kripke-type
semantics.
Default logic and autoepistemic logic provide some expressive possibilities
that seem to have no counterparts in circumscription. Circumscribing a predicate
$P$ corresponds to the use of the default
${:¬P(x)} / {¬P(x)}$
(``If it is consistent that $¬P(x)$, then $¬P(x)$''). This default
has a very special
form: It is a ``normal default without a prerequisite''.
The corresponding autoepistemic axiom is
$P(x) ⊃ L P(x)$
(``$P(x)$ only if I believe that $P(x)$''). Such ``normal''
axioms are, syntactically, very special also.
Recent work on applications of formal nonmonotonic reasoning shows that
nonnormal defaults and nonnormal autoepistemic axioms may be quite useful. We
will briefly discuss this work in the
sections on logic programming, on inheritance systems and on reasoning about
action. These ideas apparently cannot be expressed in terms of
traditional circumscription.
On the other hand, circumscription has a number of attrative features in
comparison with autoepistemic logic:
1. It doesn't involve modal operators, and thus is more
economical in the use of logical tools.
2. It handles quantifiers and equality better.
3. The use of autoepistemic logic leads to inconsistencies very easily.
For instance, the theory with the single axiom $L p$ has no stable expansions.
Circumscription is different in this respect (Lifschitz 1986). (An approach to
correcting this problem with autoepistemic logic is proposed in (Morris 1988)).
This discussion leads us to the problem of developing new approaches to
formalizing nonmonotonic reasoning, that will combine the
attractive features of circumscription, on the one hand, and of default logic
and autoepistemic logic, on the other. In the section on
introspective circumscription we describe one promising idea of this kind that we
are planning to investigate.
\subsection{Relation of Nonmonotonic Formalisms to Logic Programming}
A close connection has been uncovered between formalized nonmonotonic
reasoning and the semantics of negation in logic programming. In (Lifschitz
1988), the semantics of stratified programs was reformulated in terms of
circumscription. This led Przymusinski (1988) to a more general ``perfect
model'' semantics.
In (Gelfond 1987), general logic programs are translated into
autoepistemic theories
by inserting the modal operator $L$ after each negation. This translation
provides a simple and elegant declarative semantics for a wide class of logic
programs; see also (Gelfond and Lifschitz 1988). The autoepistemic theories
obtained as the result of Gelfond's translation are nonnormal.
\subsection{Automation of Nonmonotonic Reasoning}
Systems capable of performing some forms of
nonmonotonic reasoning have been available for a long time:
Implementations of logic programming languages and of frame languages with
defaults, as well as truth maintenance systems (Doyle 1979), (de Kleer
1986) belong to this category.
Theoretical work on formalized
nonmonotonic reasoning leads to extensions of these computational methods
and to new approaches to the automation of nonmonotonic reasoning. In
(Lifschitz 1985), a generalization of the process of predicate completion
is used for expressing special cases of circumscription by
first order formulas. Further results of this type are proved in
(Lifschitz 1987a), (Kolaitis and Papadimitriou 1988) and (Rabinov 1988).
Algorithms for determining the effect of circumscription
are proposed in (Minker and Perlis 1985), (Gelfond and Przymusinska 1986)
and (Przymusinski 1986). Gelfond and
Lifschitz (1988) propose to compile special cases of circumscription into
logic programs.
Implemented nonmonotonic reasoning systems are described in (Poole
{\it et al.} 1987) and (Ginsberg 1988).
\subsection{Inheritance Systems}
Recent work on the semantics of inheritance systems with exceptions
has led to several definitions that are ``equally sound and intuitive, but
do not always agree'' with each other (Touretzky {\it et al.\/} 1987).
Hopefully, additional insight into this ``clash of intuitions''
can be gained by treating inheritance reasoning as a special case of
general nonmonotonic reasoning. Circumscription is applied to formalizing
inheritance systems in (McCarthy 1986), (Brewka 1987) and (Haugh 1988). In
(Gelfond 1988) and (Gelfond and Przymusinska 1988), inheritance systems
are represented by nonnormal autoepistemic theories.
\subsection{Diagnostic Reasoning}
Reiter (1987) showed how abnormality
predicates can be used for describing the abnormal functioning of a device.
Diagnostic reasoning attempts to determine the extents of the
abnormality predicates, whereas in other kinds of nonmonotonic reasoning
those predicates play an auxiliary role.
\subsection{Reasoning about Action}
Facts about action and change can be expressed in first-order languages
with symbols for {\it situations} and {\it fluents} (functions of situations).
This formalism is known as the {\it situation calculus} (McCarthy and Hayes
1969). Nonmonotonic
reasoning has permitted inferring the effects of actions from much weaker
premises than are needed when deduction is the only tool available. This
is discussed in (McCarthy 1980) and (McCarthy 1986).
A difficulty with the nonmonotonic reasoning about action was independently
discovered by Lifschitz (McCarthy 1986) and
Hanks and McDermott (1986). This difficulty gives rise to
unintended minimal models of abnormality in the circumscription formalism
and also to unintended models when the logic of defaults is used. Several
formalisms that get around the difficulties have been discovered; a critical
discussion of some proposals can be found in (Hanks and McDermott 1987).
Two of the proposed solutions seem especially attractive: one based on
the axiomatization of causality (Lifschitz 1987, Haugh 1987),
the other using nonnormal defaults (Morris 1987, Gelfond 1988).
At first researchers dealt primarily with the simplest form of
reasoning about action, known as ``temporal projection.'' In temporal
projection problems, an initial situation is assumed known, and the goal
is to find out how the world will change after the execution of a given sequence
of actions. More recently they turned to the problem of ``temporal explanation,''
which involves reasoning about partially unknown past events (Morgenstern and
Stein 1988, Lifschitz and Rabinov 1988).
An alternative to the situation calculus, the {\it event calculus}
proposed in (Kowalski and Sergot 1986), is a method for formalizing
reasoning about action on the basis of logic programming.
The papers (Lifschitz 1987) and (Lifschitz and Rabinov 1988) are
included below as Appendices D and E.
\section{Proposed Work on Nonmonotonic Reasoning}
\subsection{Introspective Circumscription}
In the section on the relation between nonmonotonic formalisms we explained
the importance of developing the approaches to
formalizing nonmonotonic reasoning that would combine the
attractive features of several methods available now. In the best possible world,
a counterpart of nonnormal defaults would be available, as well as the full
power of circumscription, while modelling nonmonotonicity would
be based on a simple syntactic transformation of formulas of classical logic.
A step in this direction is made in (Perlis 1988), where the definition of
``autocircumscription'' is proposed. It is formally similar to the definition
of circumscription. But the new concept ``is not really circumscription
at all, in the sense that it does not aim at minimizing extensions''. Instead,
it is ``related to self knowledge, and especially to negative introspection.''
Lifschitz (1988a) introduced another modification of circumscription
along the lines of Perlis's work. He calls this form of circumscription
``introspective.'' The simplest case is defined as follows.
Consider two unary predicates, $P$ and $LP$.
The formula $LP(x)$ reads: $P(x)$ {\it is believed}. (The choice of
notation $LP$ is related to the use of the modal operator L in autoepistemic
logic.) For any sentence $A(P,LP)$, its {\it introspective circumscription}
is the formula
$$A(P,LP)∧∀x[LP(x)≡∀p(A(p,LP)⊃p(x))].$$
Informally, we think of $A$ as the conjunction of all beliefs that an
agent has (i) about the predicate $P$ and (ii) about his beliefs
about the predicate $P$. The second conjunctive term in the formula above
expresses the ``rationality''
of the agent: he believes that $P(x)$ holds if and only if this
follows from his beliefs $A$.
Although introspective circumscription does not use modal operators, it is clearly
based on the same intuition as autoepistemic logic.
The formal properties of these concepts are similar.
The models of introspective
circumscription can be described as fixpoints of a certain operator, just like
stable expansions (or extensions in Reiter's logic of defaults).
At the same time, according to (Lifschitz 1988a),
McCarthy's ``minimizing'' circumscription is included in the new version as a
special case.
We are planning to further investigate properties of this new theory of
nonmonotonic reasoning and to apply it to formalizing commonsense knowledge.
One particular application is related to reasoning about action. We have
mentioned two competing approaches to this problem, one based on axiomatizing
causality, the other on nonnormal defaults. The task of comparing these
approaches is complicated by the fact that they are based on different
mathematical models of nonmonotonicity: the former on circumscription, and the
latter on autoepistemic logic. Hopefully, the possibility of recasting both
ideas in terms of introspective circumscription will be helpful.
\subsection{Ramification Problem}
Lifschitz (1987) describes the effects of actions using the predicates
$causes(a,f,v)$ (the action $a$ causes the fluent $f$ to take on the value $v$)
and $precond(f,a)$ ($f$ is a precondition for the successful execution of $a$).
For instance, the formulas
$$causes(move(x,l),location\;x,l),$$
$$precond(clear\;l,move(x,l))$$
express that the action $move(x,l)$ causes the location of $x$ to become equal
to $l$, and that $l$ should be clear in order for this action to be successful.
Unfortunately, this method requires that all changes caused by a given action,
including its indirect ``ramifications,'' be described explicitly. For instance,
the only fluent directly affected by the action $move(x,l)$ is $location\;l$,
but one of the ramification of this action is that the fluent $clear\;l$ becomes
false, so that the additional axiom
$$causes(move(x,l),clear\;l,false)$$
is needed. Since changing the value of a fluent can have many ramifications,
this is not satisfactory. In (Lifschitz 1987) this difficulty is alleviated
by dividing all fluents into two groups, ``primitive'' and ``nonprimitive.'' Only
primitive fluents are allowed to occur as the second argument of $causes$;
the nonprimitive fluents are explicitly defined in terms of promitive. For
instance, $clear\;l$ can be treated as a nonprimitive fluent defined by
$$value(clear\;l,s)=true≡¬∃x[value(location\;x,s)=l].$$
\subsection{Creating and Destroying Objects}
We propose to investigate how {\it creating} and {\it destroying} objects
can be described in the situation calculus. This will require
actions of the form $create(spec)$, where the specification $spec$ is a list
of conditions on the object to be created. For instance, the action
$create(\langle isblock\rangle)$ represents creating a block, and
$create(\langle isblock,large,red\rangle)$
represents creating a large red block. The object created when the action
$create(spec)$ is executed in a situation $s$ can be denoted by $new(spec,s)$.
Here are some of the properties of the action $create(spec)$:
$$¬exists(new(spec,s),s),$$
$$exists(new(spec,s),result(create(spec),s)),$$
$$member(f,spec)⊃value(f,\langle new(spec,s),result(create(spec),s))\rangle)=true.
$$
The use of the predicate $exists$ here is analogous to its use in
(McCarthy 1979a), except that now it is given an additional situation argument.
Destroying an object $x$ can be described by the formula
$$¬exists(x,result(destroy(x),s)).$$
It remains to be seen how these ideas can be incorporated in the general framework
of the nonmonotonic theory of action and change.
\subsection{More General Ontologies of Time and Action}
The relation between causality and the values of fluents is described in
(Lifschitz 1987) by the ``law of change'' that can be written as
$$success(a,s) ∧ causes(a,f,v) ⊃ value(f,result(a,s))=v.$$
This axiom presupposes a very simple model of time and action: There are no
concurrent actions, time is discrete, actions affects the state of the world
immediately, etc. We propose to investigate the possibility of extending this
formulation to more general ontologies.
1. To allow concurrent actions, we will
interpret the action argument of the $result$ function as a {\it set of
elementary actions} (suggested by Michael Gelfond).
Then actions will be partially ordered by the ``subaction'' relation $≤$.
The law of inertia will be rewritten along the lines of
$$success(a,s) ∧ causes(a,f,v) ∧ a≤a' ⊃ value(f,result(a',s))=v.$$
This postulate will imply, for instance, that performing the actions
$move(x1,l1)$ and $move(x2,l2)$ concurrently
will lead to the situation in which $x1$ is located at $l1$ and $x2$ at $l2$.
But for some groups of actions, such as
$move(x,l1)$ and $move(x,l2)$,
it should be impossible to derive the effect of performing these actions
concurrently from the postulates describing the effects of performing them
one by one.
2. A discrete model of time is presupposed by the very syntax of the
situation calculus: When an action $a$ is performed in a situation
$s$, then $result(a,s)$ is the {\it earliest} situation which temporally follows
this event. To allow continuous time, we will have to change this syntax, and
replace $result$ by the predicate $occurs$.
The following tentative formulation of a ``continuous'' law of change postulates
that the fluent in question has the new value in every situation that temporally
follows $s$ and is sufficiently close to it:
$$\eqalign{
occurs(a,s) ∧ &success(a,s) ∧ causes(a,f,v)\cr
&⊃∃s'[s<s'∧∀s''(s<s''≤s' ⊃ value(f,s'')=v)].\cr}
$$
The symbols $<$ and $≤$ represent here the temporal order of situations.
3. To include delayed effects, we can replace the expression in the
brackets in the last formula by
$$lag(a,s)<s'∧∀s''(lag(a,s)<s''≤s' ⊃ value(f,s'')=v),$$
where $lag(a,s)$ denotes the end of the ``latent period'' of $a$ performed in
the situation $s$. We can relate $lag$ to the length of the ``latent period''
by the axiom
$$time(lag(a,s))=time(s)+delay(s).$$
\subsection{Strategies}
Besides actions, McCarthy and Hayes (1969) introduce ``strategies,'' which are
constructed from individual actions like programs from atomic statements.
For instance, the strategy that consists of walking 17
blocks south, turning right and then walking till you come to
Chestnut Street, can be written as follows:
$$\eqalign{
{\bf\ begin}&\cr
&face(South);\cr
&n:=0;\cr
b:\quad &{\bf if}\;n=17 {\bf\ then\ go\ to}\;a;\cr
&walk\hbox{-}a\hbox{-}block; n:=n+1;\cr
&{\bf go to}\;b;\cr
a:\quad &turn\hbox{-}right;\cr
c:\quad &walk\hbox{-}a\hbox{-}block;\cr
&{\bf if\ }\;name\hbox{-}on\hbox{-}street\hbox{-}sign\,≠\,'Chestnut\ Street'
{\bf\ then\ go\ to\ }\; c\cr
{\bf\ end};\;&\cr
}$$
In the literature on reasoning about action this aspect of the situation calculus
was largely ignored. We are planning to investigate how the available
axiomatizations of elementary actions can be used for proving properties of
strategies.
\section{Proposed Work on Other Aspects of the Logic Approach}
\subsection{Formalization of Contexts}
It is a commonplace that the meaning of any expression depends
on context. This holds true just as much for the mathematical
logical sentences used in AI as for sentences of natural language.
However, to my knowledge context is usually treated informally
in the natural language and philosophical literature. However, this
won't do for AI programs aimed at approaching human performance, because
it won't allow programs that reason about their own contexts. Even
with lower performance goals it seems worthwhile to formalize
reasoning about contexts.
We expect our formalization to have the following properties.
1. Contexts comprise one of the sorts of individuals in a first
order language, probably supplemented by second order sentences and
reasoning used in circumscription.
2. A basic predicate is $holds$ where $holds(p,c)$ asserts that
proposition $p$ holds in contexts $c$. (Those familiar with situation
calculus should avoid being misled by a similarity of notation into
assuming that contexts are just a variant of situations. That they are
different and much more general will become apparent in the following
paragraphs). We also expect to use the more general $value(exp,c)$ where
$exp$ denotes an expression, but it is more difficult to figure out how to
do this right, because it isn't clear how to specify the range of
$value(exp,c)$.
3. Contexts are ``rich'' entities, not to be fully described.
Thus the ``normal English language context of an American'' contains {\it
factual assumptions} and {\it linguistic conventions} that not all English
speakers will know. Moreover, common not yet examined by anybody
philosophical assumptions are included in most contexts, but might be
transcended when an explicit effort is being made by man or machine.
4. As an example of intended usage consider the formula
%
$$holds(on(cat,mat),c17),$$
%
which plays the role of asserting that a particular cat is on a
particular mat on a particular occasion. $c17$ carries information
identifying $cat$ with a particular cat and also identifies the
meaning of $mat$ and incorporates the occasion. The occasion may
be real or may be fictitious, e.g. from the Sherlock Holmes stories.
It would be a convention of $c17$ that a time doesn't have to be
specificed in this sentence, although $c17$ might be a context that
requires that some other sentences specify times. Even the use
of predicate calculus notation is just a convention of $c17$.
Thus we might have $holds(``The\ cat\ is\ on\ the\ mat.'',c18)$,
where $c18$ is a context with the same factual assumptions as
$c17$ but with different linguistic conventions.
5. The above-mentioned $c17$ would be a rather specialized
context. Assertions made in $c17$, i.e. sentences of the form
$holds(p,c17)$, are connected to less specialized contexts by
sentences of our context language. For example we might have
%
$$meaning(cat,c17) = meaning(''John\ McCarthy's\ cat'',c9)$$
%
where $c9$ is a context in which John McCarthy is identified with
a particular person, but a reference of $cat$ with a particular
beast is not. $c9$ would also be a context in which English
phrases are meaningful.
6. There is no most general context. The reason for this
is that the context language needs to allow the possibility of
creating new contexts generalizing aspects of the most general
previous context.
A convenient example is provided by a variant of the Searle
(Searle 1980) ``Chinese room'' story. In this variant the man keeps all
the information about how to respond to sequences of Chinese characters in
his head. Searle's confusion results from identifying a personality with
the body it occupies. This is a convention of the English language and
his useful and harmless as long as each body is inhabited by a single
personality. Explaining the ``Chinese room'' story requres distinguishing
the personality conducting the Chinese dialog and which understands
Chinese from the ``normal personality'' of the man and which doesn't
understand Chinese.
We plan to develop operations that specify more general
contexts from the contexts already used. The Chinese room
example requires getting a context that uses different names
for the minds and bodies from some ``normal English context''
that identifies them.
7. We give one example of a rule relating contexts which
certain sentences are explicitly indexed by times and a more-specialized
context in thich they are not.
%
$$∀pct (holds (p, spec\hbox{-}time (st))≡ holds (at (t,p), c))$$
%
Most like such rules will have to be stated within contexts, but we haven't
studied how to do this yet.
8. The context language will include notions of entering and leaving
a context which generalize natural deduction. Suppose we have
%
$$holds (p,c).$$
%
If we write
%
$$enter\ c,$$
%
as a command, we may then write $p$ by itself. If we subsequently infer
$q$, we have allowed to write
%
$$holds (q,c).$$
%
When we have entered a context there will be restrictions on inferences
analogous to those that apply when an assumption is made in natural deduction.
One increased generality over natural deduction is that
$holds (p,c1)$ and $holds (not\ p,c2)$ behave differently from formulas
like $c1 ⊃ p$ and $c2 ⊃ ¬p$. For example, if
$c1$ is associated with the time 5 pm and $c2$ with 6 pm, and $p$ is
$at(I, office)$, then $holds(p,c1)$ and $holds(not\ p,c2)$ might be used (with
other premises) to infer that I went home between 5 pm and 6 pm. $c1 ⊃ p$
and $c2 ⊃ ¬p$ cannot be used in this way, because the logic has to
regard $p$ as expressing a single proposition that is either true or false.
9. Perhaps it will be convenient to start the reasoning done
by the program by assuming that a certain rather general context has been
entered.
The major goals of the research are to determine the rules that
should be used to relate contexts to their generalizations and
specializations. Many of these rules will be nonmonotonic.
\subsection{Mental Situations}
We intend to study the AI applicability of the situation calculus
model whose basic formula is
%
$$s' = result(e,s),$$
%
where $s$ is a situation and $e$ is an event to mental situations and
mental events. Mental events include external observations, deductions,
steps of nonmonotonic reasoning, and also observations of the mental
situation itself. A mental situation is a generalization of a database
in which all sentences have their full pedigrees in the database. As a
database it includes facts about the extenal world, goals in the
external world, facts about what it does and doesn't know and mental
goals including the goal of determining a $y$ satisfying some
predicate $P(x,y)$.
I need to know the third reason for introducing mental situations, in order
to complete the following fragment:
Here are the reasons for introducing mental situations.
1. We hope to use them in the control of reasoning. If we
regard reasoning steps as mental actions, we can prove properties
of reasoning algorithms in the same way we prove properties of
other programs of action. In the mental situation, we want to
prove the appropriateness of reasoning strategies for achieving
mental goals.
2. The limitations of the situation calculus, e.g. to
discrete non-concurrent events, arise less in thinking than in
external action.
\subsection{General-Purpose Database of Common Sense}
On the basis of our work on the formal treatment of nonmonotonic
reasoning and contexts, we propose to create a small database of general
commonsense facts about physical objects, about change and time, about motion
and space, about agents and their beliefs and actions, about creating and
destroying objects and changing their internal states.
The database will describe a simple model of the world, as in
traditional robot problem solving. But the facts
included in the database will be quite general and nor oriented toward any specific
application.
The database will be organized as a complex of interrelated logical theories,
many of them nonmonotonic. Each theory will correspond to one particular
context. For instance, some contexts will house ``static'' facts, not related to
the concept of change. Other contexts will describe change in the traditional,
discrete version of the situation calculus. Still others will use a more
sophisticated continuous model of time. Conceptual
mechanisms will be provided for ``switching'' from one context to another in the
process of formal commonsense reasoning.
The main objective of the project will be to investigate to what degree the
recent achievements in the logic approach to AI give us
an epistemologically sound foundation for various kinds of commonsense
reasoning.
\subsection{Heuristics}
McCarthy has renewed his interest in what might be called
{\it qualitative hill climbing}. Ordinary hill climbing by program
uses a numerical evaluator of some kind or a measure of the distance
from a goal, perhaps with the resources used so far subtracted.
Its biggest problem is the existence of local optima, and special
methods often must be devised to get off the local optima.
Qualitative hill climbing doesn't use numerical evaluation.
Instead it uses a predicate $better(p1,p2)$ that can sometimes say
that position $p2$ is better than position $p1$. The strategy uses
breadth first search or iterative deepening in order to find a
path to a position that is better than the base position. It moves
along that path to the new position and starts over.
The $better$ predicate should have two properties. First, if
the position keeps getting better, then it should lead to achieving
the goal in a reasonable number of stages of improvement. Second,
the amount of lookahead required to find a better position should be
reasonable.
A numerical measure of the value of a position leads trivially
to a $better$ predicate, i.e. we define
%
$$better(p1,p2) ≡ value(p2) > value(p1).$$
%
However, a $better$ predicate is often easier to find since it is
merely a predicate whose transitive closure is an inductive partial
ordering. A numerical predicate permits any two positions to be
compared, but this is often unnecessary and doesn't correspond to
what people do. Thus someone who can play chess well or solve
Rubik's cube can't give a numerical value to arbitrary positions
or even tell which of two arbitrary positions is better. His ability
corresponds more to being able to compare a base position to certain
nearby positions and sometimes conclude that the nearby position
represents an improvement.
In Fall 1987 the idea was tried out by David Throop, graduate
student at the University of Texas at Austin, and it worked quite well
for the 15 puzzle. It was quite easy to devise a $better$ predicate
that corresponded to our intuitive notions of what constitutes an
improvement in the 15 puzzle and which worked in the LISP program.
However, the work with the 15 puzzle still involved
the computer doing much more computation to improve a position than
a human would do. The improvement required is to compare not just
positions but equivalence classes of positions. Doing so corresponds
to human practice and enormously reduces the search.
Actually the $better$ predicate was supplemented by a $worse$
predicate. When the lookahead encounters a position worse than the
base position, it backtracks immediately and doesn't look at the
position's successors. This also corresponds to human heuristics,
and it greatly reduces the amount of search required to find a better
position.
The idea of the $better$ and $worse$ predicates applies also to
games like chess. Then the search has to find a way to force an
improvement in the position. It was tried by McCarthy's student Barbara
Liskov (then Huberman) in her 1968 Stanford PhD thesis. With some
elaboration it worked well for the classical endgame checkmates with a
rook, with two bishops, or with a bishop and knight. However, it now
seems more effective to explore the idea and its modifications in simpler
problems.
We plan to explore qualitative hill climbing further.
\bigbreak
\section{\bf Personnel}
\subsection{\bf Biography of John McCarthy}
\medskip
\noindent
Biographical sketch:
John McCarthy is Professor of Computer Science at Stanford
University. He has been interested in artificial intelligence since
1949 and coined the term in 1955. His main artificial intelligence
research area has been the formalization of common sense knowledge.
He invented the LISP programming language in 1958, developed the concept of
time-sharing in the late fifties and early sixties, and has worked on
proving that computer programs meet their specifications since the early sixties.
His most recent theoretical development is the circumscription method
of non-monotonic reasoning since 1978. McCarthy received the A. M. Turing
award of the Association for Computing Machinery in 1971 and was elected
President of the American Association for Artificial Intelligence for
1983-84. He received the first Research Excellence Award of the
International Joint Conference on Artificial Intelligence in 1985
and the Kyoto Prize of the Inamori Foundation in November 1988.
\bigskip
\beginlsverbatim
BORN:
September 4, 1927 in Boston, Massachusetts
EDUCATION:
B.S. (Mathematics) California Institute of Technology, 1948
Ph.D. (Mathematics) Princeton University, 1951
HONORS AND SOCIETIES:
American Academy of Arts and Sciences
American Society for the Advancement of Science
American Mathematical Society
Association for Computing Machinery
IEEE
Sigma Xi
Sloan Fellow in Physical Science, 1957-59
ACM National Lecturer, 1961
A. M. Turing Award from Association for Computing Machinery, 1971
Editorial Board, Artificial Intelligence Journal, 1975 - present
Academic Advisor, National Legal Center for Public Information, 1976 - 1980
Board of Directors, Information International, Inc., 1962 - present.
Board of Directors, Inference Corporation, 1983 - present.
Board of Directors, Mad Intelligent Systems, 1987 - present.
Sigma Xi National Lecturer, 1977
Fellow, Center for Advanced Study in the Behavioral Sciences, 1979 - 1980.
President, American Association for Artificial Intelligence, 1983-84
Research Excellence Award, International Joint Conference on Artificial
Intelligence, 1985
Elected to National Academy of Engineering, 1987
Kyoto Prize, 1988
PROFESSIONAL EXPERIENCE:
\nobreak%
Procter Fellow, Princeton University, 1950-51
Higgins Research Instructor in Mathematics, Princeton University, 1951-53
Acting Assistant Professor of Mathematics, Stanford University, Sept. 1953 - Jan. 1955
Assistant Professor of Mathematics, Dartmouth College, Feb. 1955 - June 1958
Assistant Professor of Communication Science, M.I.T., 1958 - 1961
Associate Professor of Communication Science, M.I.T., 1961 - 1962
Professor of Computer Science, Stanford University, 1962 - present
Director, Artificial Intelligence Laboratory, Stanford University, 1965 - 1980
Charles M. Pigott Professor of Engineering, Stanford University, from 1987
Bobby R. Inman Professor of Computer Science, University of Texas, Fall 1987
PROFESSIONAL RESPONSIBILITIES AND SCIENTIFIC INTERESTS:
\nobreak%
With Marvin Minsky organized and directed the Artificial Intelligence Project at M.I.T.
Organized and directed Stanford Artificial Intelligence Laboratory
Developed the LISP programming system for computing with symbolic
expressions, participated in the development of the ALGOL 58
and the ALGOL 60 languages.
Present scientific work is in the fields of Artificial Intelligence,
Computation with Symbolic Expressions, Mathematical Theory of Computation,
Time-Sharing computer systems, formalization of nonmonotonic reasoning,
parallel symbolic computation.
\endlsverbatim
\bigbreak\centerline{\bf Publications}\nobreak\smallskip
\begingroup
\parindent0pt
{\bf McCarthy, John (1951)}: ``Projection Operators and Partial Differential
Equations'' Ph.D. Thesis, Princeton University.
{\bf McCarthy, John (1952)}: ``A Method for the Calculation of Limit Cycles by
Successive Approximation'' in {\it Contributions to the Theory of Nonlinear
Oscillations II}, Annals of Mathematics Study No. 29, Princeton University,
pp. 75-79.
{\bf McCarthy, John (1953)}: ``An Everywhere Continuous Nowhere Differentiable
Function,'' {\it American Mathematical Monthly}, December 1953, p. 709.
{\bf McCarthy, John (1954)}: ``A Nuclear Reactor for Rockets'' {\it Jet Propulsion},
January 1954.
{\bf McCarthy, John (1955)}: ``The Stability of Invariant Manifolds'' Applied
Mathematics Laboratory Technical Report No. 36, Stanford University, 25
pp.
{\bf McCarthy, John (1956)}: ``The Inversion of Functions Defined by Turing
Machines,'' in {\it Automata Studies, Annals of Mathematical Study No. 34,}
Princeton, pp. 177-181.
{\bf McCarthy, John (1956)}: ``Aggregation in the Open Leontief Model,''
in Progress Report of Dartmouth Mathematics Project.
{\bf McCarthy, John (1956)}: ``Measures of the Value of Information,''
{\it Proceedings of the National Academy of Sciences}, September 1956.
{\bf McCarthy, John (1956)}: Co-editor with Dr. Claude E. Shannon of {\it Automata
Studies}, Annals of Mathematics Study No. 34, Princeton University Press.
{\bf McCarthy, John (1960)}: ``Recursive Functions of Symbolic Expressions and their
Computation by Machine,'' {\it Comm. ACM}, April 1960.
{\bf McCarthy, John (1960)}: ``Programs with Common Sense'', in Proceedings of the
Teddington Conference on the Mechanization of Thought Processes, Her Majesty's
Stationery Office, London.
% common[e80,jmc],
% common.tex[e80,jmc]
{\bf McCarthy, John (with 12 others) (1960)} ``ALGOL 60'', {\it Numerische
Mathematik}, March 1960, also in {\it Comm. ACM}, May 1960 and Jan. 1963.
{\bf McCarthy, John (1961)}: ``A Basis for Mathematical Theory of Computation'',
in {\it Proc. Western Joint Computer Conf.}, May 1961, pp. 225-238.
Later version in Braffort, P. and D. Hirschberg (eds.) {\it Computer
Programming and Formal Systems}, North-Holland Publishing Co. (1963).
{\bf McCarthy, John (1962)}: ``Time-Sharing Computing Systems,'' in {\it Management
and the Computer of the Future}, Martin Greenberger (ed.), MIT Press.
{\bf McCarthy, John (with Paul Abrahams, Daniel Edwards, Timothy
Hart and Michael Levin) (1962)}:
{\it LISP 1.5 Programmer's Manual}, M.I.T. Press, Cambridge, Mass.
{\bf McCarthy, John (1962)}: ``Computer Programs for Checking Mathematical
Proofs'', {\it Amer. Math. Soc. Proc. of Symposia in Pure Math.}, Vol. 5.
{\bf McCarthy, John (1963)}: ``Towards a Mathematical Theory of Computation'',
in Proc. IFIP Congress 62, North-Holland, Amsterdam.
{\bf McCarthy, John (1963)}: ``A Basis for a Mathematical Theory of Computation'',
in P. Braffort and D. Hirschberg (eds.), {\it Computer Programming and
Formal Systems}, North-Holland Publishing Co., Amsterdam, pp. 33-70.
{\bf McCarthy, John (1963)}: ``A Time-Sharing Debugging System for a Small
Computer'', (with Boilen, Fredkin and Licklider), Proc. AFIPS 1963 Spring
Joint Computer Conf., Sparten Books, Detroit, pp. 51-57.
{\bf McCarthy, John (1963)}: ``The Linking Segment Subprogram Language and
Linking Loader Programming Languages'', Comm. ACM, July 1963. (with F.
Corbato and M. Daggett),
{\bf McCarthy, John (1965)}: ``Problems in the Theory of Computation'', in Proc.
IFIP Congress 65, Spartan, Washington, D.C..
{\bf McCarthy, John (1966)}: ``A Formal Description of a Subset of Algol'',
{\it Formal Language Description Languages for Computer Programming},
T.B. Steel, Jr. (ed.), North-Holland Publ. Co., Amsterdam, pp. 1-12.
{\bf McCarthy, John (1968}: ``Time-Sharing Computer Systems'', in
{\it Conversational Computers}, William Orr (ed), Wiley Publishing Company.
{\bf McCarthy, John (1966)}: ``Information'', {\it Scientific
American}, Vol. 215, No. 9.
{\bf McCarthy, John (1967)}: ``Correctness of a Compiler for Arithmetic
Expresions'' (with James Painter),
{\it Proceedings of Symposia in Applied Mathematics, Volume XIX},
American Mathematical Society.
{\bf McCarthy, John (1967)}: ``THOR - A Display Based Time-Sharing System'',
(with D. Brian, G. Feldman, and John Allen)
{\it AFIPS Conf. Proc.}, Vol. 30, (FJCC) Thompson, Washington, D.C..
{\bf McCarthy, John (1967)}: ``Computer Control of a Hand and Eye'', in {\it Proc.
Third All-Union Conference on Automatic Control (Technical Cybernetics)},
Nauka, Moscow, (Russian).
{\bf McCarthy, John (1968)}: ``Programs with Common Sense,'' in M. Minsky (ed.),
{\it Semantic Information Processing}, M.I.T. Press, Cambridge, Mass.
{\bf McCarthy, John (1968)}: ``A Computer with Hands, Eyes, and Ears,''
(with L. Earnest, D. Reddy, P. Vicens)
{\it Proc. AFIPS Conf.} (FJCC).
{\bf McCarthy, John and P.J. Hayes (1969)}: ``Some Philosophical Problems from
the Standpoint of Artificial Intelligence'', in D. Michie (ed), {\it Machine
Intelligence 4}, American Elsevier, New York, NY.
% phil.tex[ess,jmc] with slight modifications
{\bf McCarthy, John (1972)}: ``The Home Information Terminal,'' Man and Computer,
in Proceedings International Conference, Bordeaux 1970, S. Karger, N.Y.
{\bf McCarthy, John (1973)}: ``Mechanical Servants for Mankind,'' in {\it Britannica
Yearbook of Science and the Future}.
{\bf McCarthy, John (1974)}:
Book Review: ``Artificial Intelligence: A General Survey'' by Sir James
Lighthill, in {\it Artificial Intelligence}, Vol. 5, No. 3.
% LIGHT.RE5[ESS,JMC] 25-Jul-75 Lighthill review
{\bf McCarthy, John (1974)}: ``Modeling Our Minds'' in {\it Science Year 1975, The
World Book Science Annual}, Field Enterprises Educational Corporation,
Chicago, ILL.
{\bf McCarthy, John (1976)}: ``The Home Information Terminal,'' invited presentation,
AAAS Annual Meeting, Feb. 18-24, 1976, Boston.
{\bf McCarthy, John (1976)}: ``An Unreasonable Book,'' a review of {\it Computer
Power and Human Reason}, by Joseph Weizenbaum (W.H. Freeman and Co., San
Francisco, 1976) in {\it SIGART Newsletter 58}, June 1976, also in {\it Creative
Computing}, Chestnut Hill, Massachusetts, 1976 and in ``Three Reviews of Joseph
Weizenbaum's {\it Computer Power and Human Reason}'', (with Bruce
Buchanan and Joshua
Lederberg), Stanford Artificial Intelligence Laboratory Memo 291, Computer
Science Department, Stanford, CA.
{\bf McCarthy, John (1977)}:
Review: {\it Computer Power and Human Reason}, by Joseph Weizenbaum (W.H.
Freeman and Co., San Francisco, 1976) in {\it Physics Today}.
{\bf McCarthy, John (1977)}:
``The Home Information Terminal'' {\it Grolier Encyclopedia}.
{\bf McCarthy, John (1977)}:
``On The Model Theory of Knowledge'' (with M. Sato, S. Igarashi, and
T. Hayashi), {\it Proceedings of the Fifth International Joint Conference
on Artificial Intelligence}, M.I.T., Cambridge, Mass.
{\bf McCarthy, John (1977)}:
``Another SAMEFRINGE'', in {\it SIGART Newsletter} No. 61, February 1977.
% samefr[f76,jmc]
{\bf McCarthy, John (1977)}:
``History of LISP'', in Proceedings of the ACM Conference on the
History of Programming Languages, Los Angeles.
% lisp[f77,jmc]
{\bf McCarthy, John (1977)}:
``Epistemological Problems of Artificial Intelligence'', {\it Proceedings
of the Fifth International Joint Conference on Artificial
Intelligence}, M.I.T., Cambridge, Mass.
% ijcai.c[e77,jmc]
{\bf McCarthy, John (1979)}:
``Ascribing Mental Qualities to Machines'' in {\it Philosophical Perspectives
in Artificial Intelligence}, Ringle, Martin (ed.), Harvester Press, July 1979.
% .<<aim 326, MENTAL[F76,JMC],
% mental.tex[f76,jmc]>>
{\bf McCarthy, John (1979)}:
``First Order Theories of Individual Concepts and Propositions'',
in Michie, Donald (ed.) {\it Machine Intelligence 9}, (University of
Edinburgh Press, Edinburgh).
% .<<aim 325, concep.tex[e76,jmc]>>
{\bf Cartwright, Robert and John McCarthy (1979)}:
``Recursive Programs as Functions in a First Order Theory'',
in {\it Proceedings of the International Conference on Mathematical Studies of
Information Processing}, Kyoto, Japan.
% .<<aim 324, FIRST.NEW[W77,JMC]>>
{\bf McCarthy, John (1980)}:
``Circumscription - A Form of Non-Monotonic Reasoning'', {\it Artificial
Intelligence}, Volume 13, Numbers 1,2, April.
% .<<aim 334, circum.new[s79,jmc], cirnew.tex[s79,jmc]>>
{\bf McCarthy, John and Carolyn Talcott (1980)}: {\it LISP - Programming and
Proving}, course notes, Stanford University. (to be published as a book).
% .<<The references in this bibliography (BIOJMC[1,JMC]) should be in a
% .uniform style, because I often copy them to papers. The last few are
% .correct. The publication in italics and first names spelled out.
% .>>
{\bf McCarthy, John (1982)}: ``Common Business Communication Language'', in
{\it Textverarbeitung und B\"urosysteme}, Albert Endres and J\"urgen Reetz, eds.
R. Oldenbourg Verlag, Munich and Vienna 1982.
% cbcl[f75,jmc]
{\bf McCarthy, John (1982)}: {\it Coloring Maps and the Kowalski Doctrine},
Report No. STAN-CS-82-903, Computer Science Department, Stanford University,
Stanford, CA 94305.
% maps[e81,jmc]
{\bf McCarthy, John (1983)}: ``AI Needs more Emphasis on Basic Research'', {\it AI
Magazine}, Volume 4, Number 4, Winter 1983.
% presid.1[f83,jmc]
{\bf McCarthy, John (1983)}: ``Some Expert Systems Need Common Sense'',
in {\it Computer Culture: The Scientific, Intellectual and Social Impact
of the Computer}, Heinz Pagels, ed.
vol. 426, Annals of the New York Academy of Sciences.
%paper
%presented at New York Academy of Sciences Symposium.
% common[e83,jmc]
%common.tex[e83,jmc]
{\bf McCarthy, John (1983)}: ``The Little Thoughts of Thinking Machines'',
{\it Psychology Today}, Volume 17, Number 12, December 1983.
% psycho.4[e83,jmc]
{\bf Gabriel, Richard and John McCarthy (1984)}:
``Queue-based Multi-processing Lisp''
in {\it Proceedings of the 1984 ACM Symposium on Lisp and Functional Programming},
August 1984.
{\bf McCarthy, John (1986)}:
``Applications of Circumscription to Formalizing Common Sense Knowledge''
{\it Artificial Intelligence}, April 1986
% circum.tex[f83,jmc]
{\bf McCarthy, John (1987)}:
``Review of {\it Artificial Intelligence --- The Very Idea} by John
Haugeland'' to appear in {\it SIAM News}.
% haugel[f86,jmc]
{\bf McCarthy, John (1987)}:
``Generality in Artificial Intelligence'', {\it Communications of the ACM}.
Vol. 30, No. 12, pp. 1030-1035
% genera[w86,jmc]
{\bf McCarthy, John (1987)}:
``Mathematical Logic in Artificial Intelligence'', in
{\it Daedalus}, vol. 117, No. 1, American Academy of Arts and Sciences,
Winter 1988.
% logic.2[w87,jmc]
{\bf McCarthy, John (1988)}:
``Artificial Intelligence'', for {\it Collier's Encyclopedia}.
% artifi.3[w88,jmc]
\subsection{\bf Biography of Vladimir Lifschitz}
\smallskip
\beginlsverbatim
BORN:
May 30, 1947 in Moscow, USSR
EDUCATION:
Degree in Mathematics (B.S., M.S.~equivalent) from Leningrad University,
Leningrad, USSR, 1968
Ph.D. in Mathematics from the Steklov Mathematical Institute, Leningrad, USSR,
1971
EXPERIENCE:
Senior Research Associate, Stanford University, 1985--now
Associate Professor of Computer Science, University of Texas -- El Paso, 1982--84
Assistant Professor of Computer Science, University of Texas -- El Paso, 1979--82
Visiting Assistant Professor, Brigham Young University, 1977--79
Lecturer and Research Associate, Stanford University, 1976--77
Research Associate, Institute of Mathematics and Economics, Leningrad, 1971-74
\endlsverbatim
{
\parindent=20pt
\noindent{PUBLICATIONS:}
\bigskip
\item{1.} ``The normal form for deductions in predicate calculus with equality
and function symbols'',
{\sl Seminars in Mathematics} {\bf 4}, Leningrad: Steklov
Math.~Institute, 1967. {\sl Math.~Rev.}~{\bf 38} $\# 5583$.
\smallskip
\item{2.} ``Certain reduction classes and undecidable theories'',
{\sl Seminars in Mathematics} {\bf 4}, Leningrad: Steklov
Math.~Institute, 1967. {\sl Math.~Rev.}~{\bf 38} $\# 5603$.
\smallskip
\item{3.} ``Deductive validity and reduction classes'',
{\sl Seminars in Mathematics} {\bf 4}, Leningrad: Steklov
Math.~Institute, 1967. {\sl Math.~Rev.}~{\bf 38} $\# 5604$.
\smallskip
\item{4.} ``The decision problem for some constructive theories of equality'',
{\sl Seminars in Mathematics} {\bf 4}, Leningrad: Steklov
Math.~Institute, 1967. {\sl Math.~Rev.}~{\bf 38} $\# 65$.
\smallskip
\item{5.} ``The decision problem for some constructive theories of equality'',
{\sl Seminars in Mathematics} {\bf 4}, Leningrad: Steklov
Math.~Institute, 1967. {\sl Math.~Rev.}~{\bf 38} $\# 4281$.
\smallskip
\item{6.} ``Constructive mathematical theories consistent with classical logic'',
{\sl Proc. Steklov Math.~Institute} {\bf 98}, 1968. {\sl Math.~Rev.}~{\bf 36}
$\# 6265$.
\smallskip
\item{7.} ``A refinement of the form of deductions in predicate calculus with
equality and function symbols'',
{\sl Proc.~Steklov Math.~Institute} {\bf 98}, 1968. {\sl Math.~Rev.}~{\bf 40}
$\# 5415$.
\smallskip
\item{8.} ``Constructive analytic functions of the real variable'',
{\sl Seminars in Mathematics} {\bf 8}, Leningrad: Steklov
Math.~Institute, 1968. {\sl Math.~Rev.}~{\bf 43} $\# 7323$.
\smallskip
\item{9.} ``On a set of zeroes of a constructive power series in a real variable'',
{\sl Seminars in Mathematics} {\bf 16}, Leningrad: Steklov
Math.~Institute, 1969. {\sl Math.~Rev.}~{\bf 41} $\# 5193$.
\smallskip
\item{10.} ``The investigation of constructive functions by the method of
fillings'', {\sl Journal of Soviet Math.} {\bf 1}, 1972.
{\sl Math.~Rev.}~{\bf 48} $\# 3714$.
\smallskip
\item{11.} ``A locally analytic constructive function which is not analytic'',
{\sl Soviet Math.~Dokl.}~{\bf 13}, 1972.
{\sl Math.~Rev.}~{\bf 45} $\# 5320$.
\smallskip
\item{12.} ``A non-compact closed sphere in a compact constructive metric space'',
{\sl Seminars in Mathematics} {\bf 32}, Leningrad: Steklov
Math.~Institute, 1972. (With V.~P.~Chernov).
{\sl Math.~Rev.}~{\bf 49} $\# 1487$.
\smallskip
\item{13.} ``$CT↓0$ is stronger than $CT↓0!$'', {\sl Proc. Amer. Math. Soc.}
{\bf 73}(1), 1979.
\smallskip
\item{14.} ``An intuitionistic definition of classical natural numbers'',
{\sl Proc. Amer. Math. Soc.}
{\bf 77}(3), 1979.
\smallskip
\item{15.} ``Semantical completeness theorems in logic and algebra'',
{\sl Proc. Amer. Math. Soc.}
{\bf 79}(1), 1980.
\smallskip
\item{16.} ``The efficiency of an algorithm of integer programming:
a probabilistic analysis'',
{\sl Proc. Amer. Math. Soc.}
{\bf 79}(1), 1980.
\smallskip
\item{17.} ``The number of increasing subsequences of the random permutation'',
{\sl Journal of Combinatorial Theory}, Ser.~A,
{\bf 31}, 1981. (With B. Pittel).
\smallskip
\item{18.} ``Constructive assertions in an extension of classical mathematics'',
{\sl Journal of Symbolic Logic}
{\bf 47}(2), 1982.
\smallskip
\item{19.} ``A note on the complexity of a partition algorithm'',
{\sl Information Processing Letters}
{\bf 17}, 1983. (With L.~Pesotchinsky).
\smallskip
\item{20.} ``The worst-case performance and the most probable performance of a
class of set-covering algorithms'',
{\sl SIAM Journal on Computing}
{\bf 12}(2), 1983. (With B.~Pittel).
\smallskip
\item{21.} ``On verification of programs with goto statements'',
{\sl Information Processing Letters}
{\bf 18}, 1984.
\smallskip
\item{22.} ``Some results on circumscription'', in:
{\sl Non-Monotonic Reasoning Workshop}, New Paltz, 1984.
\smallskip
\item{23.} ``Calculable natural numbers'', in:
{\sl Intensional Mathematics}, North-Holland, 1985.
\smallskip
\item{24.} ``Computing circumscription'', in:
{\sl Proc. IJCAI-85}, 1985.
\smallskip
\item{25.} ``Closed-world data bases and circumscription'',
{\sl Artificial Intelligence}, {\bf 27}, 1985.
\smallskip
\item{26.} ``On the satisfiability of circumscription'',
{\sl Artificial Intelligence}, {\bf 28}, 1986.
\smallskip
\item{27.} {\sl Mechanical Theorem Proving in the USSR: Leningrad School},
Delphic Associates, 1986.
\smallskip
\item{28.} ``On the semantics of STRIPS'', in:
{\sl Reasoning about Actions and Plans},
Morgan Kaufmann, Los Atos, 1987.
\smallskip
\item{29.} ``Pointwise circumscription: Preliminary report'',
in: {\sl Proc. AAAI-86}, 1986.
\smallskip
\item{30.} ``On the declarative semantics of logic programs with negation'',
in: {\sl Workshop on Foundations of Logic Programming and Deductive Databases},
Washington, D.C., 1986.
\smallskip
\item{31.} ``Formal theories of action: preliminary report'',
in: {\sl Proc. IJCAI-87}, 1987.
\smallskip
\item{32.} ``Formal theories of action'', in: {\sl The Frame Problem in Artificial
Intelligence, Proceedings of the 1987 Workshop}, 1987.
\smallskip
\item{33.} ``Circumscriptive theories: a logic-based framework for commonsense
knowledge (preliminary report)'', in: {\sl Proc. AAAI-87}, 1987.
\smallskip
\item{34.} ``Pointwise circumscription'', in: {\sl Readings in Non-Monotonic
Reasoning}, Morgan Kaufmann, Los Atos, 1987.
\smallskip
\item{35.} ``What is the inverse method?'', ao apppear in {\sl Journal of Automated
Reasoning}.
\smallskip
\item{36.} ``The stable model semantics for logic programming'', in: {\it Proceedings
of the 5th International Congference Symposium on Logic Programming}.
\smallskip
\item{37.} ``Rules for computing circumscription''
(in preparation).
\smallskip
\item{38.} ``Compiling circumscriptive theories into logic programs: Preliminary
Report'', in {\it Proc. AAAI-88}.
\bigskip
\noindent{INVITED PAPERS AT SCHOLARLY CONFERENCES:}
\medskip
\item{1.} ``Calculable Natural Numbers'', Conference on Constructive Mathematics,
New Mexico State University, 1980.
\smallskip
\item{2.} ``Classical Numbers from the Constructive Standpoint'',
L.~E.~J.~Brower Centennial, Netherlands, 1981.
\smallskip
\item{3.} ``Models of Circumscription'',
Workshop on Logic and Computer Science,
University of Kentucky, 1985.
\smallskip
\item{4.} ``On the Semantics of STRIPS'',
Workshop on Planning and Reasoning about Action, Timberline, 1986.
\smallskip
\item{5.} ``On the Declarative Semantics of Logic Programs with Negation'',
Workshop on the Foundations of Logic Programming and Deductive Databases,
Washington, D.C., 1986.
\bigskip
\noindent{AWARDS:}
\medskip
\noindent
Publishers' Prize, The Tenth International Joint Conference on Artificial
Intelligence (Milan, 1987).
\bigskip
\noindent{MEMBERSHIP IN PROFESSIONAL ORGANIZATIONS:}
\medskip
\noindent Association for Computing Machinery.
\smallskip
\noindent American Association for Artificial Intelligence.
\smallskip
\noindent Association for Symbolic Logic.
\smallskip
\noindent{OTHER PROFESSIONAL ACTIVITIES:}
\medskip
\noindent Member of the Editorial Board of the journal {\sl Artificial
Intelligence.}
\smallskip
\noindent Member of the
Committee on Translations of the Association for Symbolic Logic.
\smallskip
\noindent Member of the program committees of the Second Annual Symposium on Logic
in Computer Science, of the Second International Symposium on Methodologies for
Intelligent Systems, and of the Seventh National Conference on Artificial
Intelligence.
\smallskip
\noindent Refereed papers for the {\sl
ACM Journal,
Annals of Pure and Applied Logic,
Computational Intelligence,
International Joint Conferences on Artificial Intelligence,
Journal of Logic Programming,
Journal of Symbolic Logic,
Mathematics of Operations Research
}.
}\endgroup
\centerline{\bf Appendix A. AI Needs More Emphasis on Basic Research}
\bigskip
Too few people are doing basic research in AI relative to the
number working on applications. The ratio basic/applied is less in
AI than in the older sciences and than in computer science generally.
This is unfortunate, because reaching human level artificial intelligence
will require fundamental conceptual advances. Even the applied goals
proposed by various groups in the U.S., Europe and Japan
for the next ten years are not just engineering extrapolations from
the present state of science. Their realization will require more basic
research than is now being done.
Jon Doyle put it this way in a recent net message. ``$\dots$ tentative,
but disturbing conclusion: that the students interested in AI are not
very interested in fundamental questions, open problems, and long term
research, but instead are eager to get in on big, build-it-now
projects in expert systems and natural language interfaces.'' He
was definite about CMU, but he conjectured that the situation was similar
elsewhere, and I suppose student preferences are similar in different
places.
I'll begin with a few recriminations and then try to be more
constructive. First the Government, specifically DARPA and NSF, had
a fit of extreme `practicality' in the early 1970s. The Mansfield
amendment required DARPA to claim short term military relevance for
what it supported, and NSF diverted much of its resources to `Research
Applied to National Needs'. The older sciences were able to resist this
in NSF but lost their DARPA support completely. AI, which was more
dependent on DARPA than the others were, survived but wounded. The
situation has improved in both places in recent years.
Second the opportunities to make money have perhaps lured
some people away from research per se. I don't really know the
extent to which this is true. Maybe they were tired of research.
Third much of the theoretical work in AI is beside the point
and unlikely to lead to advances toward human level intelligence.
The mathematically talented like well-defined conjectures
the wherein the mere statement of the result that has been proved or
the asymptotic behavior of the algorithm discovered wins instant
scientific recognition.
AI badly needs mathematical and logical theory,
but the theory required involves
conceptual innovations --- not just mathematics. We won't reach
human level intelligence by more algorithms reducing the complexity
of a problem from $n↑2$ to $n\log n$ and still less by proofs that
yet another problem is unsolvable or NP-complete.
Of course, these results are often very significant as mathematics
or computer science.
Fourth, like many fields AI is given to misguided enthusiasms
in which large numbers of people make the same errors. For example,
much of the present work in natural language processing seems
misguided to me. There is too much emphasis on syntax and not enough on
the semantics. Natural language front ends on programs that convert
between existing AI formalisms and English miss the point.
What we can learn from natural language is not how to express in English
what we already know how to express in computerese. Rather we must
study those ideas expressible in natural language that no-one knows
how to represent at all in a computer.
We also won't reach human level intelligence by building
larger and larger production systems involving more and more facts
all on the same level. Of course, these systems of limited intelligence
may have substantial practical utility.
Now that I've finished grumbling, I'll try to be constructive.
1. People beginning their research careers should think about the
long term goals of AI and should think how to apply their own talents
in the best way. If they can do first class basic research they should.
2. In my opinion, the key problem at present is the formalization
of common sense knowledge and reasoning ability. It still looks to me
that separating epistemology from heuristics will pay off.
3. We need to think hard about how to make experiments that
are really informative. At present the failures are more important
than the successes, because they often tell us that the intellectual
mechanisms we imagined would intelligently solve certain problems
are inadequate.
4. We need good problem domains --- the AI analog of what the
Drosophila did for genetics. The Soviet computer scientist A. S.
Kronrod once referred to chess as the Drosophila of artificial
intelligence, because it permitted comparison of human and artificial
intellectual mechanisms. Unfortunately, chess was discouraged as
a serious problem domain, and most chess programming is carried on
at the level of sport rather than science. In particular, there is
little publication about the intellectual mechanisms involved, and
the race often involves merely faster hardware.
5. I also believe there is a large payoff in a more general
analysis of the concept of pattern.
Finally, let me cheerfully admit that general harangues
like this one are no substitute for scientific papers setting
forth specific problems in detail.
I hope that other members of AAAI will express their own
opinions about what the basic research problems are.
\bigbreak
\centerline{\bf Appendix B. Mathematical Logic in Artificial Intelligence}
\bigskip
This article concerns computer programs that represent information
about their problem domains in mathematical logical languages and use
logical inference to decide what actions are appropriate to achieve their
goals.
Mathematical logic isn't a single language. There are many
kinds of mathematical logic, and even choosing a kind doesn't specify
the language. This is done by declaring what {\it non-logical} symbols
will be used and what sentences will be taken as axioms. The non-logical
symbols are those that concern the concrete subject matter, e.g. objects
and their locations and motions.
Whatever the choice of symbols,
all kinds of mathematical logic share two ideas. First, it must
be mathematically definite what strings of symbols are considered
formulas of the logic. Second, it must be mathematically definite what
inferences of new formulas from old ones are allowed. This permits
writing computer programs that decide what combinations of symbols are
sentences and what inferences are allowed in a particular logical
language.
Mathematical logic has become an important branch of
mathematics, and most logicians work on problems arising from the internal
development of the subject. It has also been applied to studying the
foundations of mathematics, and this has been its greatest success. Its
founders, Aristotle, Leibniz, Boole and Frege also wished to apply it to
making reasoning about human affairs more rigorous. Indeed Leibniz was
explicit about his goal of replacing argument by calculation. However,
expressing knowledge and reasoning about the common sense world in
mathematical logic has encountered difficulties that seem to require
extensions of the basic concepts of logic, and these extensions are only
beginning to develop.
If a computer is to store facts about the world and reason with
them, it needs a precise language, and the program has to embody a
precise idea of what reasoning is allowed, i.e. of how new formulas may be
derived from old. Therefore, it was natural to try to use mathematical
logical languages to express what an intelligent computer program knows
that is relevant to the problems we want it to solve and to make the
program use logical inference in order to decide what to do. The first
proposal to use logic in artificial intelligence
for expressing
what a program knows and and how it should reason was in a paper I wrote in
1959.\footnote{$↑*$}{The best general text on the logic approach to AI is
(Genesereth and Nilsson 1987).}
The problem of proving logical
formulas as a domain for AI had already been studied. In this paper I said:
\bigskip
\begingroup\narrower\narrower
% COMMON.TEX[E80,JMC] TeX version Programs with Common Sense
The {\it advice taker} is a proposed program for solving problems by
manipulating sentences in formal languages. The main difference between
it and other programs or proposed programs for manipulating formal
languages (the {\it Logic Theory Machine} of Newell, Simon and Shaw and
the Geometry Program of Gelernter) is that in the previous programs the
formal system was the subject matter but the heuristics were all embodied
in the program. In this program the procedures will be described as much
as possible in the language itself and, in particular, the heuristics are
all so described.
The main advantages we expect the {\it advice taker} to have is that
its behavior will be improvable merely by making statements to it,
telling it about its symbolic environment and what is wanted from it. To
make these statements will require little if any knowledge of the program
or the previous knowledge of the {\it advice taker}. One will be able to
assume that the {\it advice taker} will have available to it a fairly wide
class of immediate logical consequence of anything it is told and its
previous knowledge. This property is expected to have much in common with
what makes us describe certain humans as having {\it common sense}. We
shall therefore say that {\it a program has common sense if it automatically
deduces for itself a sufficiently wide class of immediate consequences of
anything it is told and what it already knows} (McCarthy 1959).
\par\endgroup
\bigskip
The advice taker prospectus, ambitious in 1960, would
be considered ambitious today and is still far from being immediately
realizable. Mathematical logic is especially far from the goal of
expressing the heuristics in the same language in which are expressed the facts
the heuristics must act on. Yet
the main reasons for using logical sentences extensively in AI
are better understood by researchers today than in 1959. Expressing
information in declarative sentences is far more flexible than
expressing it in segments of computer program or in tables. Sentences
can be true in much wider contexts than specific programs can be useful.
The supplier of a fact does not have to understand much about how the
receiver functions or how or whether the receiver will use it. The same
fact can be used for many purposes, because the logical consequences of
collections of facts can be available.
Existing computer programs come more or less close to this goal, depending
on the extent to which they use the formalisms of logic.
I shall begin by describing four levels of their use.
1. A machine on the lowest level uses no logical sentences. All its ``beliefs''
are implicit in its state. Nevertheless, it is often appropriate to
ascribe beliefs and goals to the program. A missile may believe its target
is friendly and abandon the goal of hitting it.
One can often usefully say that a certain machine does
what it thinks will achieve its goals. Such ascription of mental qualities to
machines is discussed in
(Dennett 1971), (McCarthy 1979) and (Newell 1980).
The intent of the machine's designers and the way it can be expected to
behave may be more readily described in terms of intention
than by a purely physical description.
The relation between the physical and the {\it intentional}
descriptions is most readily understood in simple systems that admit
readily understood descriptions of both kinds. Take a thermostat as an
example. We might say that when it believes the temperature is too hot,
it turns on the cooling system in order to achieve its goal of getting the
right temperature. Some finicky philosophers object to such ascription.
Unless a system has a full human mind, they
contend, it shouldn't be regarded as having any mental qualities
at all. This is like omitting the numbers 0 and 1 from the number system
on the grounds that numbers aren't required to count sets with no elements
or with one element. Of course, ascribing beliefs to machines (and people)
is more important when our physical knowledge is inadequate to explain or
predict behavior.
Much more that can be said about ascribing
mental qualities to machines, but that's not what AI is mainly concerned with
today.
2. The next level of use of logic involves computer programs that
use sentences in machine memory to represent their beliefs but use
other rules than ordinary logical inference to reach conclusions.
New sentences are often obtained from the old ones
by ad hoc programs. Moreover, the sentences that appear in memory are
from a program-dependent subset of the logical language being used.
Adding certain true sentences in the language may even spoil the functioning of
the program. Logic is used at this second level in ``expert systems,''
programs that consist of knowledge bases (e.g., lists of disease symptoms
in medical expert systems) and inference engines (which contain rules, in the
form of explicit instructions to the machine, on how to manipulate the information
in the knowledge base). In comparison with the languages of first-order logic,
languages used at this level are often rather unexpressive.
For example they may not admit quantified sentences (i.e., sentences including
``for all' and ``there exists''), and they may represent general rules in
a separate notation. Often rules cannot be
consequences of the program's action; they must have all been put in by a
``knowledge engineer''. Sometimes the reason programs have this form is
just ignorance, but the usual reason for the restriction is the practical
one of making the program run fast and deduce just the kinds of
conclusions its designer anticipates. Most often the rules are
implications used in just one direction, e.g. the contrapositive of an
implication is not used. I believe the need for such specialized inference
will turn out to be temporary and will be reduced or eliminated by
improved ways of controlling general inference, --- for example, by allowing the
heuristic rules to be also expressed as sentences as advocated in
the preceding extract from my 1959 paper.
3. The third level uses first order logic as well as logical
deduction. Usually the sentences are represented as clauses, and the
deduction methods are based on J. Allen Robinson's (1965) method of
resolution. A fact in one such program's data base might be:
\medskip
\centerline{(for all ($x$) (if (and (inst $x$ vegetable) (color $x$ purple))}
\nobreak
\centerline{(inst $x$ eggplant)))}
\medskip\noindent
Translated into more common language, the fact reads: ``All purple vegetables
are eggplants.'' Its structure is typical of if-then clauses in logical
data bases: given any $x$, if $x$ satisfies the stated conditions, then $x$
ensures a certain result --- (for all ($x$) (if (conditions) then (result))).
In the example, $x$ must satisfy two conditions: (inst $x$ vegetable) and
(color $x$ purple). The first condition means that $x$ must be a specific
instance of the class of vegetables, and the second means thatthe color of $x$
must be purple. The result, (inst $x$ eggplant), means that $x$ is an instance
of an eggplant. Armed with this fact, the program might seem ready to
to take on this task:
\medskip
\centerline{(inst Gertrude vegetable)}
\centerline{(color Gertrude purple)}
\centerline{(SHOW: (inst Gertrude eggplant))}
\medskip\noindent
Translated, the task is: Given the fact that Gertrude is a purple vegetable,
show that Gertrude is an eggplant. But with just the logical fact, the program
can do nothing with the task. It needs a method for reasoning from general
statements about nondescript $x$'s to specific statements about Gertrude. The
reasoning of Robinson's resolution method prescribes a way to substitute
Gertrude for $x$ and thereby unify the clauses in the data base with those
in the task.
Examples of such programs used commercially are ``expert system shells''
(ART, KEE, OPS-5) --- computer programs that create generic expert systems.
You tell the program what facts you want in the data base; the program converts
the facts into logical statements and then follows the heuristics in its own
inference engine to create an inference engine tailored to the facts you put
into the program.
The third level of logic is less used for practical
purposes than is level two, because techniques for controlling the
reasoning are still insufficiently developed, and it is common for the
program to generate many useless conclusions before it reaches a desired
solution. Indeed unsuccessful experience (Green 1969) with this method
led to more restricted uses of logic, e.g. the STRIPS system of (Nilsson
and Fikes 1971).
In this connection it is important to mention logic programming,
first introduced in Microplanner (Sussman et al., 1971)
and from different points of view by Robert Kowalski (1979) and Alain
Colmerauer in the early 1970s.
A recent text is (Sterling and Shapiro 1986). Microplanner
was a rather unsystematic collection of tools, whereas Prolog relies
almost entirely on one kind of logic programming, but the main idea
is the same. If one uses a restricted class of sentences, the so-called
Horn clauses, then it is possible to use a restricted form of logical
deduction. This eases the control problem and makes it possible
for the programmer to anticipate the course the deduction will take.
The problem is that only certain kinds of facts are conveniently
expressed as Horn clauses. Nevertheless,
expressibility in Horn clauses is an important property of a set
of facts and logic programming has been successfully used for many
applications (although it seems unlikely to dominate AI programming
as some of its advocates hope).
Although they express both facts and rules
as logical sentences, third-level systems
are still rather specialized. The axioms
with which the programs begin are not general truths about the world
but are sentences whose meaning and truth is limited to the narrow
domain in which the program has to act. For this reason, the facts
of one program usually cannot be used in a database for other programs.
4. The fourth level is still a goal. It involves representing
general facts about the world as logical sentences. Once put in
a database, the facts can be used by any program. The facts would
have the neutrality of purpose characteristic of much human information.
The supplier of information would not have to understand
the goals of the potential user or how his mind works. The present
ways of ``teaching'' computer programs amount to education
by brain surgery.
A major difficulty is that fourth level systems require extensions
to mathematical logic. One kind of extension is nonmonotonic
reasoning, first proposed in the late 1970s (McCarthy 1977, 1980, 1986),
(Reiter 1980), (McDermott and Doyle 1980). Traditional logic is monotonic
in the following sense. If a sentence $p$ is inferred from a collection
$A$ of sentences, and $B$ is a more inclusive set of sentences,
then $p$ can be inferred from $B$.
For example, let collection $A$ be these sentences: All bachelors are unmarried;
John is a bachelor. From these sentences, you can infer sentence $p$:
John is unmarried. Now let $B$ be this more inclusive collection of
sentences: All bachelors are unmarried; John has no girlfriends; John is a
bachelor. The fact that John is unmarried can be inferred from $B$ as well.
This monotonicity property is a fundamental feature of traditional logic.
It follows from a simple observation: exactly the same proof that
proves $p$ from $A$ will serve as a proof from any more inclusive set $B$.
While much human reasoning corresponds to that of traditional
logic, some important human common sense reasoning is not monotonic. We
reach conclusions from certain premisses that we would not reach if
certain other sentences were included in our premisses. For example,
learning that I own a car, you conclude that it is appropriate on a
certain occasion to ask me for a ride, but when you learn the further
fact that the car is in the garage being fixed you no longer draw that
conclusion. Some people think it is possible to try to save
monotonicity by saying that what was in your mind was not a general rule
about asking for a ride from car owners but a probabilistic rule ---
something like: ``On 70 percent of occasions it is appropriate for you to
ask for a ride if I own a car''. So
far it has not proved possible to work out the detailed
epistemology of this approach, i.e. to determine exactly what probabilistic
sentences should be used. Instead AI has moved to directly formalizing
nonmonotonic logical reasoning.
Formalized nonmonotonic reasoning is under rapid development
and many kinds of systems have been proposed. I shall concentrate on
an approach called circumscription, because I know it, and because it
has met with wide acceptance and is perhaps the most actively pursued
at present. The idea is to single out among the models of the collection
of sentences being assumed some ``preferred'' or ``standard'' models.
The preferred models are those that satisfy a certain minimum principle.
What is to be minimized is not yet decided in complete generality,
but many domains that have been studied yield quite general theories
using minimizations of abnormality or of the set of some kind of entity.
The idea is not completely unfamiliar. For example, Ockham's razor ``Do
not multiply entities beyond necessity'' is such a minimum principle.
Minimization in logic is another example of an area of mathematics
being discovered in connection with applications rather than via
the normal internal development of mathematics. Of course, the reverse
is happening on an even larger scale; many logical concepts developed
for purely mathematical reasons turn out to have importance for AI.
As a more concrete example of nonmonotonic reasoning, consider
the conditions under which a boat may be used to cross a river. We all
know of certain things that might be wrong with a boat, e.g. a leak, no
oars or motor or sails depending on what kind of a boat it is. It would
be reasonably convenient to list some of them in a set of axioms.
However, besides those that we can expect to list in advance, human
reasoning will admit still others, should they arise, but cannot be
expected to think of them in advance (e.g. a fence down the middle of
the river). One can handle this difficulty by using circumscription to minimize
the set of things that prevent the boat from crossing the river, i.e. the
set of obstacles to be overcome. If the reasoner knows of none in a
particular case, he or it will conjecture that the boat can be used, but
if he learns of one, he will get a different result when he minimizes.
This illustrates the fact that nonmonotonic reasoning
is conjectural rather than rigorous.
\bigbreak
\centerline{\bf Appendix C. Applications of Circumscription}
\centerline{\bf to Formalizing Common Sense Knowledge}
\bigskip
{
\scriptfont\itfam=\sevenrm \scriptscriptfont\itfam=\fiverm
\def\mathspace{\mathinner{\mkern-\thinmuskip}}
{\obeyspaces\gdef\rtcmath{\it\obeyspaces\let =\mathspace}}
\everymath={\rtcmath}
\everydisplay={\rtcmath}
% quick and dirty mkop
%{\obeyspaces\gdef {\ }}
\raggedbottom
\def\hcr{\hidewidth\cr}
\centerline{\bf Abstract}
\bigskip
We present a new and more symmetric version of the
circumscription method of nonmonotonic reasoning first described in
(McCarthy 1980) and some
applications to formalizing common sense knowledge. The applications in
this paper are mostly based on minimizing the abnormality of different
aspects of various entities. Included are nonmonotonic treatments of
{\it is-a}
hierarchies, the unique names hypothesis, and the frame problem.
The new circumscription may be called {\it formula circumscription} to
distinguish it from the previously defined {\it domain circumscription} and
{\it predicate circumscription}. A still more general formalism called
{\it prioritized circumscription} is briefly explored.
\sectCounter=0
\bigskip
\section{Introduction and New Definition of Circumscription}
(McCarthy 1980) introduces the circumscription method of
nonmonotonic reasoning and gives motivation, some mathematical properties
and some examples of its application. The present paper is
logically self-contained, but motivation may be enhanced by reading
the earlier paper. We don't repeat its arguments about
the importance of nonmonotonic reasoning in AI, and its examples are
instructive.
Here we give a more symmetric definition of
circumscription and applications to the formal expression of common
sense facts. Our long term goal (far from realized in the present paper)
is to express these facts in a way that would
be suitable for inclusion in a general purpose database of common
sense knowledge. We imagine this database to be used by AI programs
written after the initial preparation of the database. It would
be best if the writers of these programs didn't have to
be familiar with how the common sense facts about particular
phenomena are expressed.
Thus
common sense knowledge must be represented in a way that is not
specific to a particular application.
It turns out that many such common sense facts can be formalized
in a uniform way. A single predicate $ab$, standing for ``abnormal'' is
circumscribed with certain other predicates and functions considered as
variables that can be constrained to achieve the circumscription subject
to the axioms. This also seems to cover the use of circumscription to
represent default rules.
\section{A New Version of Circumscription}
\noindent{\bf Definition.} Let $A(P)$ be a formula of second order logic, where $P$ is
a tuple of some of the free predicate symbols in $A(P)$.
Let $E(P,x)$ be a wff in which
$P$ and a tuple $x$ of individual variables occur free. The circumscription
of $E(P,x)$ relative to $A(P)$ is the formula $A↑\prime(P)$ defined by
%
%\leql{a1}
$$A(P) ∧ ∀P↑\prime.[A(P↑\prime) ∧ [∀x.E(P↑\prime,x) ⊃
E(P,x)] ⊃ [∀x.E(P↑\prime,x) ≡ E(P,x)]].\leql{ai}$$
%
[We are here writing $A(P)$ instead of $A(P↓1,\ldots,P↓n)$ for brevity
and likewise writing $E(P,x)$ instead of
$E(P↓1,\ldots,P↓n,x↓1,\ldots,x↓m)$].
Likewise the quantifier $∀x$ stands for $∀x↓1\ldots x↓m$.
$A(P)$ may have embedded quantifiers.
Circumscription is a kind of minimization,
and the predicate symbols in $A(P)$ that are not in $P$ itself act as
parameters in this minimization. When we wish to mention
these other predicates we write $A(P;Q)$ and $E(P;Q,x)$ where
$Q$ is a vector of predicate symbols which are not allowed to be varied.
There are two differences between this and (McCarthy 1980). First,
in that paper $E(P,x)$ had the specific form $P(x)$. Here we speak of
circumscribing a wff and call the method {\it formula circumscription\/},
while there we could speak of circumscribing a predicate.
We still speak of circumscribing the predicate $P$ when $E(P,x)$ has
the special form $P(x)$. Formula circumscription is more symmetric in that
any of the predicate symbols in $P$ may be regarded as variables, and a wff
is minimized; the earlier form distinguishes one of the predicates
themselves for minimization. However, formula circumscription is reducible to
predicate circumscription provided we allow as variables predicates
besides the one being minimized.
Second, in definition (\eqref{ai}) we use an explicit quantifier for the predicate
variable $P↑\prime$ whereas in (McCarthy 1980), the formula was a schema.
One advantage of the present formalism is that now $A↑\prime(P)$ is the
same kind of formula as $A(P)$ and can be used as part of the axiom for
circumscribing some other wff.
%Remark: The predicate symbols $P↓1,\ldots,P↓n$ will not usually be all the
%predicate symbols occurring in $A(P)$.
%The situation is analogous to a minimization problem in calculus
%or calculus of variations. Some quantity is being minimized. Other
%quantities are variables whose values are to be chosen so as to achieve
%the minimum. Still others are parameters. They are not varied
%in the minimization process, and therefore the minimum obtained and the
%values of the minimizing variables depend
%on the values of the parameters. We will point out the variables and
%the parameters in the subsequent examples. When we want to be emphatic
%we will write $A(P;Q,x)$, where the $P$ are variable and the $Q$ are
%parameters.
In some of the literature, it has been supposed that nonmonotonic
reasoning involves giving all predicates their minimum extension. This
mistake has led to theorems about what reasoning cannot be
done that are irrelevant to AI and database theory,
because their premisses are too narrow.
\section{A Typology of Uses of Nonmonotonic Reasoning}
Before proceeding to applications of circumscription I want to
suggest a typology of the uses of nonmonotonic reasoning.
Each of the several papers that introduces a mode of nonmonotonic
reasoning seems to have a particular application in mind. Perhaps we are
looking at different parts of an elephant.
The orientation is towards circumscription,
but I suppose the considerations apply to other formalisms as well.
Nonmonotonic reasoning has several uses.
1. As a communication convention. Suppose A tells B about
a situation involving a bird. If the bird cannot fly, and this
is relevant, then A must
say so. Whereas if the bird can fly, there is
no requirement to mention the fact. For example, if I hire you
to build me a bird cage and you don't put a top on it, I can
get out of paying for it even if you tell the judge that I never
said my bird could fly. However, if I complain that you wasted
money by putting a top on a cage I intended for a penguin, the judge
will agree with you that if the bird couldn't fly I should have
said so.
The proposed Common Business Communication Language (CBCL)
(McCarthy 1982) must include nonmonotonic conventions about what
may be inferred when a message leaves out such items as the
method of delivery.
2. As a database or information storage convention. It may be a
convention of a particular database that certain predicates have
their minimal extension. This generalizes the closed world
assumption. When a database makes the closed world assumption
for all predicates it is reasonable to imbed this fact in the
programs that use the database.
However, when only some predicates are to be minimized,
we need to say which ones by appropriate
sentences of the database, perhaps as a preamble
to the collection of ground sentences that usually constitute the
main content.
Neither 1 nor 2 requires that most birds can fly.
Should it happen that most birds that are subject to the communication
or about which information is requested from the data base cannot fly, the
convention may lead to inefficiency but not incorrectness.
3. As a rule of conjecture. This use was emphasized
in (McCarthy 1980). The circumscriptions may be regarded as expressions of some
probabilistic notions such as ``most birds can fly'' or they may be
expressions of standard cases. Thus it is simple to conjecture that
there are no relevant present
material objects other
than those whose presence can be inferred. It is also
a simple conjecture that a tool asserted to be present is usable
for its normal function. Such conjectures sometimes conflict, but there
is nothing wrong with having incompatible conjectures on hand. Besides
the possibility of deciding that one is correct and the other wrong, it
is possible to use one for generating possible exceptions to the other.
4. As a representation of a policy. The example is Doyle's ``The meeting
will be on Wednesday unless another decision is explicitly made''.
Again probabilities are not involved.
5. As a very streamlined expression of probabilistic information when
numerical probabilities, especially conditional probabilities, are
unobtainable. Since circumscription doesn't provide numerical probabilities,
its probabilistic interpretation involves
probabilities that are either infinitesimal, within an
infinitesimal of one, or intermediate --- without any discrimination
among the intermediate values. The circumscriptions give conditional
probabilities. Thus we may treat the probability that a bird
can't fly as an infinitesimal. However, if the rare
event occurs that the bird is a penguin, then the conditional probability that
it can fly is infinitesimal, but we may hear of some rare condition
that would allow it to fly after all.
Why don't we use finite
probabilities combined by the usual laws? That would be fine
if we had the numbers, but circumscription is usable when we can't
get the numbers or find their use inconvenient. Note that the
general probability that a bird can fly may be irrelevant, because
we are interested in the facts that influence our opinion about
whether a particular bird can fly in a particular situation.
Moreover, the use of probabilities is normally considered
to require the definition of a sample space, i.e. the space of all
possibilities. Circumscription allows one to conjecture that the
cases we know about are all that there are. However, when additional
cases are found, the axioms don't have to be changed. Thus there
is no fixed space of all possibilities.
Notice also that circumscription does not provide for
weighing evidence; it is appropriate
when the information permits snap decisions. However, many
cases nominally treated in terms of weighing information are in fact
cases in which the weights are such that circumscription and other
defaults work better.
6. Auto-epistemic reasoning. ``If I had an elder brother, I'd know it''.
This has been studied by R. Moore. Perhaps it can be handled by
circumscription.
7. Both common sense physics and common sense psychology use nonmonotonic
rules. An object will continue in a straight line if nothing interferes
with it. A person will eat when hungry unless something prevents it.
Such rules are open ended about what might prevent the expected behavior,
and this is required, because we are always encountering unexpected
phenomena that modify the operation of our rules. Science, as distinct
from common sense, tries to work with exceptionless rules. However,
this means that common sense reasoning has to decide when a scientific
model is applicable, i.e. that there are no important phenomena not
taken into account by the theories being used and the model of the
particular phenomena.
Seven different uses for nonmonotonic reasoning seem
too many, so perhaps we can condense later.
\section{Minimizing Abnormality}
Many people have proposed representing facts about what is
``normally'' the case. One problem is that every object is abnormal
in some way, and we want to allow some aspects of the object to be
abnormal and still assume the normality of the rest.
We do this with a predicate $ab$ standing for ``abnormal''. We circumscribe
$ab z$. The argument of $ab$ will be some aspect of the entities
involved. Some aspects can be abnormal without affecting others. The
aspects themselves are abstract entities, and their unintuitiveness is
somewhat a blemish on the theory.
The idea is illustrated by the examples of the following sections.
\section{Whether Birds can Fly}
\xdef\whether{\number\scount}
Marvin Minsky challenged us advocates of formal systems
based on mathematical logic to express the facts and nonmonotonic
reasoning concerning the ability of birds to fly.
There are many ways of nonmonotonically axiomatizing the
facts about which birds can fly. The following axioms using $ab$ seem
to me quite straightforward.
%\leql{a4a:}
$$∀x.¬ab aspect1 x ⊃ ¬flies x.\leql{aiva}$$
%
Unless an object is abnormal in $aspect1$, it can't fly. (We're using a
convention that parentheses may be omitted for functions and predicates
of one argument, so that (\eqref{aiva}) is the same as $∀x.(¬ab(aspect1(x))
⊃ ¬flies(x))$.)
It wouldn't work to write $ab x$ instead of $ab aspect1 x$,
because we don't want a bird that is abnormal with respect to its ability
to fly to be automatically abnormal in other respects. Using aspects limits
the effects of proofs of abnormality.
%leql{a5:}
$$∀x.bird x ⊃ ab aspect1 x.\leql{av}$$
%leql{a6:}
$$∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x.\leql{avi}$$
%
Unless a bird is abnormal in $aspect2$, it can fly.
A bird is abnormal in $aspect1$, so (\eqref{aiva}) can't be used to show it
can't fly. If (\eqref{av}) were omitted, when we did the circumscription we
would only be able to infer a disjunction. Either a bird is abnormal in
$aspect1$ or it can fly unless it is abnormal in $aspect2$. (\eqref{av})
expresses our preference for inferring that a bird is abnormal
in $aspect1$ rather than $aspect2$. We call (\eqref{av}) a {\it cancellation of
inheritance} axiom.
%leql{a7:}
$$∀x.ostrich x ⊃ ab aspect2 x.\leql{avii}$$
%
Ostriches are abnormal in $aspect2$. This doesn't say that an ostrich cannot
fly --- merely that (\eqref{avi}) can't be used to infer that it does.
(\eqref{avii}) is another cancellation of inheritance axiom.
%leql{a8:}
$$∀x.penguin x ⊃ ab aspect2 x.\leql{aviii}$$
%
Penguins are also abnormal in $aspect2$.
%leql{a9:}
$$∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x.\leql{aix}$$
%
%leql{a10:}
$$∀x.penguin x ∧¬ab aspect4 x ⊃ ¬flies x.\leql{ax}$$
%
Normally ostriches and penguins can't fly. However, there is
an out. (\eqref{aix}) and (\eqref{ax}) provide that under unspecified conditions,
an ostrich or penguin might fly after all. If we give no
such conditions, we will conclude that an ostrich or penguin
can't fly. Additional objects that can fly may be specified.
Each needs two axioms. The first says that it is abnormal in
$aspect1$ and prevents (\eqref{aiva}) from being used to say that it can't
fly. The second provides that it can fly unless it is abnormal
in yet another way. Additional non-flying birds can also be
provided for at a cost of two axioms per kind.
We haven't yet said that ostriches and penguins are birds,
so let's do that and throw in that canaries are birds also.
%
%leql{a11:}
$$∀x.ostrich x ⊃ bird x.\leql{axi}$$
%
%leql{a12:}
$$∀x.penguin x ⊃ bird x.\leql{axii}$$
%
%leql{a13:}
$$∀x.canary x ⊃ bird x.\leql{axiii}$$
%$$∀x.canary x ∧ ¬ab aspect5 x ⊃ bird x.\leql{axiii}$$
%
%We threw in $aspect5$ just in case one wanted to use the term canary
%in the sense of a 1930s gangster movie.
%
Asserting that ostriches, penguins and canaries are birds will help
inherit other properties from the class of birds. For example, we have
%
%leql{a14:}
$$∀x. bird x ∧ ¬ab aspect6 x ⊃ feathered x.\leql{axiv}$$
%
So far there is nothing to prevent ostriches, penguins and canaries
from overlapping. We could write disjointness axioms like
%
%leql{a15:}
$$∀x. ¬ostrich x ∨ ¬penguin x,\leql{axv}$$
%
but we require $n↑2$ of them if we have $n$ species.
It is more efficient to write axioms like
%
%leql{a15a:}
$$∀x.ostrich x ⊃ species x ={}↑\prime{}ostrich,\leql{axva}$$
%
which makes the $n$ species disjoint with only $n$ axioms assuming
that the distinctness of the names is apparent to the reasoner.
This problem is like the unique names problem.
If these are the only facts to be taken into account, we
must somehow specify that what can fly is to be determined
by circumscribing the wff $ab z$ using $ab$ and $flies$ as variables.
Why exactly these? If $ab$ were not taken as variable, $ab z$ couldn't
vary either, and the minimization problem would go away. Since
the purpose of the axiom set is to describe what flies, the predicate
$flies$ must be varied also. Suppose we contemplate taking $bird$ as
variable also. In the first place, this violates an intuition that
deciding what flies follows deciding what is a bird in the common
sense situations we want to cover. Secondly, if we use exactly the
above axioms and admit bird as a variable, we will further conclude
that the only birds are penguins, canaries and ostriches. Namely,
for these entities something has to be abnormal, and therefore minimizing
$ab z$ will involve making as few entities as possible penguins, canaries
and ostriches. If we also admit $penguin$, $ostrich$, and $canary$ as
variable, we will succeed in making $ab z$ always false, and there will be
no birds at all.
However, if the same circumscriptions are done with additional
axioms like $canary Tweety$ and $ostrich Joe$, we will get the
expected result that Tweety can fly and Joe cannot even if all the
above are variable.
While this works it may be more straightforward, and
therefore less likely to lead to subsequent trouble, to circumscribe
birds, ostriches and penguins with axioms like
%leql{a2:}
$$∀x.¬ab aspect8 x ⊃ ¬bird x, \leql{aii}$$
We have not yet specified how a program will know what to
circumscribe. One extreme is to build it into the program, but
this is contrary to the declarative spirit. However, a statement
of what to circumscribe isn't just a sentence of the language
because of its nonmonotonic character. Another possibility
is to include some sort of metamathematical
statement like
%leql{a16:}
$$circumscribe(ab z; ab, flies, bird, ostrich, penguin)\leql{axvi}$$
%
in a ``policy'' database available to the program. (\eqref{axvi}) is
intended to mean that $ab z$ is to be circumscribed with $ab$, $flies$,
$bird$, $ostrich$ and $penguin$ taken as variable. Explicitly listing
the variables makes adding new kinds awkward, since they will have
to be mentioned in the $circumscribe$ statement. Section 11 on {\it simple
abnormality theories} presents yet another possibility.
\section{The Unique Names Hypothesis}
Raymond Reiter (1980a) introduced the phrase ``unique names hypothesis''
for the assumption that each object has a unique name, i.e.
that distinct names denote distinct objects. We want
to treat this nonmonotonically. Namely, we want a wff that picks out
those models of our initial assumptions that maximize the inequality of
the denotations of constant symbols.
While we're at it, we might as well try for something
stronger. We want to maximize the extent to which distinct terms designate
distinct objects. When there is a unique model of the axioms that maximizes
distinctness, we can put it more simply; two terms denote distinct objects
unless the axioms force them to denote the same.
If we are even more
fortunate, as we are in the examples to be given, we can say that two
terms denote distinct objects unless their equality is provable.
We don't know a completely satisfactory way of doing this.
Suppose that we have a language $L$ and a theory $T$
consisting of the consequences of a formula $A$.
It would be most pleasant if we could just circumscribe equality,
but as Etherington, Mercer and Reiter (1985) point out, this doesn't
work, and nothing similar works. We could hope to circumscribe
some other formula of $L$, but this doesn't seem
to work either. Failing that, we could hope for some other second
order formula taken from $L$ that would express the unique names
hypothesis, but we don't presently see how to do it.
Our solution involves extending the language by introducing
the names themselves as the only objects. All assertions about objects
are expressed as assertions about the names.
We suppose our theory is such
that the names themselves
are all provably distinct. There are several ways of doing
this. Let the names be $n↓1$, $n↓2$, etc. The simplest solution is
to have an axiom $n↓i ≠n↓j$ for each pair of distinct names. This
requires a number of axioms proportional to the square of the number
of names, which is sometimes objectionable. The next solution involves
introducing an arbitrary ordering on the names. We have special
axioms $n↓1 < n↓2, n↓2 < n↓3, n↓3 < n↓4$, etc. and the general axioms
$∀x y.x < y ⊃ x ≠ y$ and $∀x y z. x < y ∧ y < z ⊃ x < z$. This
makes the number of axioms proportional to the number of names.
A third possibility involves mapping the names onto integers with
axioms like $index n↓1 = 1, index n↓2 = 2$, etc. and using a theory
of the integers that provides for their distinctness. The fourth
possibility involves using string constants for the names and
``attaching'' to equality in the language a subroutine that computes
whether two strings are equal. If our names were quoted symbols as
in LISP, this amounts to having ${}↑\prime{}a ≠{}↑\prime{}b$ and all its countable
infinity of analogs as axioms. Each of these devices is useful
in appropriate circumstances.
From the point of view of mathematical logic, there is no harm in
having an infinity of such axioms. From the computational point of view
of a theorem proving or problem solving program, we merely suppose that we
rely on the computer to generate the assertion that two names are distinct
whenever this is required, since a subroutine can easily tell whether two
strings are the same.
Besides axiomatizing the distinctness of the constants, we also
want to axiomatize the distinctness of terms. This may be accomplished
by providing for each function two axioms. Letting $foo$ be a function
of two arguments we postulate
%leql{ln1}
$$∀x1 x↓2 y↓1 y↓2.foo(x↓1,y↓1) = foo(x↓2,y↓2) ⊃ x↓1 = x↓2 ∧ y↓1 =y↓2\leql{lni}$$
%
and
%leql{ln2}
$$∀x y.fname foo(x,y) ={}↑\prime{}foo.\leql{lnii}$$
%
The first axiom ensures that unless the arguments of $foo$ are identical,
its values are distinct. The second ensures that the values of $foo$
are distinct from the values of any other function or any constant,
assuming that we refrain from naming any constant ${}↑\prime{}foo$.
These axioms amount to making our domain isomorphic to
an extension of the Herbrand universe of the language.
Now that the names are guaranteed distinct, what about the
objects they denote? We introduce a
predicate $e(x,y)$ and axiomatize it to be an equivalence relation.
Its intended interpretation is that the names $x$ and $y$ denote the same
object. We then formulate all our usual axioms in terms of names
rather than in terms of objects. Thus $on(n↓1,n↓2)$ means that the object
named by $n↓1$ is on the object named by $n↓2$, and $bird x$ means that the
name $x$ denotes a bird. We add axioms of substitutivity for $e$ with
regard to those predicates and functions
that are translates of predicates referring
to objects rather than predicates on the names themselves. Thus for a
predicate $on$ and a function $foo$ we
may have axioms
%leql{a4}
$$∀n↓1 n↓2 n↓1↑\prime n↓2↑\prime.e(n↓1,n↓1↑\prime) ∧ e(n↓2,n↓2↑\prime) ⊃
(on(n↓1,n↓2) ≡ on(n↓1↑\prime,n↓2↑\prime)\leql{aiv}$$
%
and
%leql{n3}
$$∀x↓1 x↓2 y↓1 y↓2.e(x↓1,x↓2) ∧ e(y↓1,y↓2) ⊃ e(foo(x↓1,y↓1),foo(x↓2,y↓2)).\leql{niii}$$
If for some class $C$ of names, we wish to assert the unique
names hypothesis, we simply use an axiom like
%leql{a4c}
$$∀n↓1 n↓2.n↓1 ε C ∧ n↓2 ε C ⊃ (e(n↓1,n↓2) ≡ n↓1 = n↓2).\leql{aivc}$$
However, we often want only to assume that distinct names denote
distinct objects when this doesn't contradict our other assumptions.
In general, our axioms won't permit making all names distinct simultaneously,
and there will be several models with maximally distinct objects. The
simplest example is obtained by circumscribing $e(x,y)$ while adhering
to the axiom
%leql{a4b}
$$e(n↓1,n↓2) ∨ e(n↓1,n↓3)\leql{aivb}$$
%
where $n↓1$, $n↓2$, and $n↓3$ are distinct names. There will then be two
models, one satisfying $e(n↓1,n↓2) ∧ ¬e(n↓1,n↓3)$ and the other satisfying
$¬e(n↓1,n↓2) ∧ e(n↓1,n↓3).$
Thus circumscribing $e(x,y)$ maximizes uniqueness of names.
If we only want unique names for some class $C$ of names, then we
circumscribe the formula
$$x ε C ∧ y ε C ⊃ e(x,y).$$
An example of such a circumscription is given in Appendix B.
However, there
seems to be a price. Part of the price is admitting names as objects.
Another part is admitting the predicate $e(x,y)$ which is substitutive
for predicates and functions
of names that really are about the objects denoted by
the names. $e(x,y)$ is not to be taken as substitutive for predicates
on names that aren't about the objects. Of these our only present example
is equality. Thus we don't have
%leql{*}
$$∀n↓1 n↓2 n↓1↑\prime n↓2↑\prime . e(n↓1,n↓1↑\prime) ∧ e(n↓2,n↓2↑\prime) ⊃
(n↓1 = n↓2) ≡ n↓1↑\prime = n↓2↑\prime).\leqno (\ast)$$
The awkward part of the price is that we must refrain from any
functions whose values are the objects themselves rather than names.
They would spoil the circumscription by not allowing us to infer the
distinctness of the objects denoted by distinct names. Actually, we
can allow them provided we don't include the axioms involving them
in the circumscription. Unfortunately, this spoils the other property
of circumscription that lets us take any facts into account.
The examples of the use of circumscription in AI in the rest
of the paper don't interpret the variables as merely ranging over
names. Therefore, they are incompatible with getting unique names
by circumscription as described in this section. Presumably it wouldn't
be very difficult to revise those axioms for compatibility with the
present approach to unique names.
\section{Two Examples of Raymond Reiter}
Reiter asks about representing, ``Quakers are normally pacifists
and Republicans are normally non-pacifists. How about Nixon, who is
both a Quaker and a Republican?'' Systems of
nonmonotonic reasoning that use non-provability as a basis for
inferring negation will infer that Nixon is neither a pacifist
nor a non-pacifist. Combining these conclusions with the original
premiss leads to a contradiction.
%
We use
%leql{c1}
$$∀x.quaker x ∧ ¬ab aspect1 x ⊃ pacifist x,\leql{ci}$$
%leql{c2}
$$∀x.republican x ∧ ¬ab aspect2 x ⊃ ¬ pacifist x\leql{cii}$$
%
and
%leql{c3}
$$quaker Nixon ∧ republican Nixon.\leql{ciii}$$
When we circumscribe $ab z$ using these three sentences as
$A(ab,pacifist)$, we will only be able to conclude that Nixon is either abnormal
in $aspect1$ or in $aspect2$, and we will not be able to say whether he
is a pacifist. Of course, this is the same conclusion as would be reached
without circumscription. The point is merely that we avoid contradiction.
Reiter's second example is that a person normally lives in the same
city as his wife and in the same city as his employer. But A's wife
lives in Vancouver and A's employer is in Toronto.
We write
%leql{c4}
$$∀x.¬ab aspect1 x ⊃ city x = city wife x\leql{civ}$$
%
and
%leql{c5}
$$∀x.¬ab aspect2 x ⊃ city x = city employer x.\leql{cv}$$
%
If we have
%leql{c6}
$$city wife A = Vancouver ∧ city employer A = Toronto ∧ Toronto
≠ Vancouver,\leql{cvi}$$
%
we will again only be able to conclude that A lives either in Toronto
or Vancouver. In this circumscription, the function $city$ must be taken
as variable. This might be considered not entirely satisfactory. If one knows
that a person either doesn't live in the same city as his wife or doesn't live
in the same city as his employer, then there is an increased probability that
he doesn't live in the same city as either. A system that did reasoning of
this kind would seem to require a larger body of facts and perhaps more
explicitly metamathematical reasoning. Not knowing how to do that, we might
want to use $aspect1 x$ in both (\eqref{civ}) and (\eqref{cv}). Then
we would conclude nothing about his city once we knew that he wasn't in the
same city as both.
\section{A More General Treatment of an {\it is-a} Hierarchy}
The bird example works fine when a fixed {\it is-a} hierarchy
is in question. However, our writing the inheritance cancellation axioms
depended on knowing exactly from what higher level the properties were
inherited. This doesn't correspond to my intuition of how we humans
represent inheritance. It would seem rather that when we say that
birds can fly, we don't necessarily have in mind that an
inheritance of inability to fly from things in general is being cancelled.
We can formulate inheritance of properties in a more general way provided
we reify the properties. Presumably there are many ways of doing this,
but here's one that seems to work.
The first order variables of our theory range over classes of
objects (denoted by $c$ with numerical suffixes), properties (denoted
by $p$) and objects (denoted by $x$). We don't identify our classes
with sets (or with the classes of G\"odel-Bernays set theory).
In particular, we don't
assume extensionality. We have several predicates:
$ordinarily(c,p)$ means that objects of class $c$ ordinarily have
property $p$. $c1 ≤ c2$ means that class $c1$ ordinarily inherits from
class $c2$. We assume that this relation is transitive. $in(x,c)$ means
that the object $x$ is in class $c$. $ap(p,x)$ means that property $p$
applies to object $x$. Our axioms are
%leql{d0}
$$∀c1 c2 c3. c1 ≤ c2 ∧ c2 ≤ c3 ⊃ c1 ≤ c3,\leql{dzero}$$
%leql{d1}
$$∀c1 c2 p.ordinarily(c2,p) ∧ c1 ≤ c2 ∧ ¬ab aspect1(c1,c2,p)
⊃ ordinarily(c1,p),\leql{di}$$
%leql{d2}
$$∀c1 c2 c3 p.c1 ≤ c2 ∧ c2 ≤ c3 ∧ ordinarily(c2, not p)
⊃ ab aspect1(c1,c3,p),\leql{dii}$$
%leql{d3}
$$∀x c p.in(x,c) ∧ ordinarily(c,p) ∧ ¬ab aspect2(x,c,p) ⊃ ap(p,x),\leql{diii}$$
%leql{d4}
$$∀x c1 c2 p.in(x,c1) ∧ c1 ≤ c2 ∧ ordinarily(c1,not p)
⊃ ab aspect2(x,c2,p).\leql{div}$$
%
Axiom (\eqref{dzero}) is the afore-mentioned transitivity of $≤$. (\eqref{di}) says
that properties that ordinarily hold for a class are inherited unless
something is abnormal. (\eqref{dii}) cancels the inheritance if there
is an intermediate class for which the property ordinarily doesn't
hold. (\eqref{diii}) says that properties which ordinarily hold actually
hold for elements of the class unless something is abnormal. (\eqref{div})
cancels the effect of (\eqref{diii}) when there is an intermediate class
for which the negation of the property ordinarily holds. Notice that
this reification of properties seems to require imitation boolean
operators. Such operators are discussed in (McCarthy 1979a).
\section{The Blocks World}
The following set of ``situation calculus'' axioms solves the
frame problem for a blocks world in which blocks can be moved and painted.
Here $result(e,s)$ denotes the situation that results when event $e$ occurs
in situation $s$. The formalism is approximately that of (McCarthy and Hayes
1969).
%leql{b1}
$$∀x e s.¬ab aspect1(x,e,s) ⊃ location(x,result(e,s))
= location(x,s).\leql{bi}$$
%leql{b2}
$$∀x e s.¬ab aspect2(x,e,s) ⊃ color(x,result(e,s)) = color(x,s).\leql{bii}$$
%
Objects change their locations and colors only for a reason.
%leql{b3}
$$∀x l s.ab aspect1(x,move(x,l),s)\leql{biii}$$
%
and
$$∀x l s.¬ab aspect3(x,l,s) ⊃ location(x,result(move(x,l),s)) = l.$$
%leql{b4}
$$∀x c s.ab aspect2(x,paint(x,c),s)\leql{biv}$$
%
and
$$∀x c s.¬ab aspect4(x,c,s) ⊃ color(x,result(paint(x,c),s)) = c.$$
%
Objects change their locations when moved and their colors when painted.
%leql{b5}
$$∀x l s.¬clear(top x,s) ∨ ¬clear(l,s) ∨ tooheavy x ∨ l = top x$$
%$$⊃ ab aspect3(x,move(x,l),s).\leql{bv}$$
$$⊃ ab aspect3(x,l,s).\leql{bv}$$
%
This prevents the
rule (\eqref{biii}) from being used to infer that an object will move
if its top isn't clear or to a destination that isn't clear
or if the object is too heavy. An object also cannot be moved to
its own top.
%leql{b6}
$$∀l s.clear(l,s) ≡ ¬∃x.(¬trivial x ∧ location(x,s) = l).\leql{bvi}$$
%
A location is clear if all the objects there are trivial, e.g. a speck of dust.
%leql{b7}
$$∀x.¬ab aspect7 x ⊃ ¬trivial x.\leql{bvii}$$
%
Trivial objects are abnormal in $aspect7$.
\section{An Example of Doing the Circumscription}
In order to keep the example short we will take into account
only the following facts from the earlier section on flying.
%leql{a4a}
$$∀x.¬ab aspect1 x ⊃ ¬flies x.\leqno(\rm \eqref{aiva})$$
%leql{a5}
$$∀x.bird x ⊃ ab aspect1 x.\leqno(\rm \eqref{av})$$
%leql{a6}
$$∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x.\leqno(\rm \eqref{avi})$$
%leql{a7}
$$∀x.ostrich x ⊃ ab aspect2 x.\leqno(\rm \eqref{avii})$$
%leql{a9}
$$∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x.\leqno(\rm \eqref{aix})$$
Their conjunction is taken as $A(ab,flies)$. This means
that what entities satisfy $ab$ and what entities satisfy $flies$ are
to be chosen so as to minimize $ab z$. Which objects are birds
and ostriches are parameters rather than variables, i.e.
what objects are birds is considered given.
We also need an axiom that asserts that the aspects are different.
Here is a straightforward version that would be rather long were there
more than three aspects.
%
$$\vcenter{\openup\jot\ialign{$#$\hfil\cr
\mskip-\thickmuskip (∀x y.¬(aspect1 x = aspect2 y))\cr
∧ (∀x y.¬(aspect1 x = aspect3 y))\cr
∧ (∀x y.¬(aspect2 x = aspect3 y))\cr
∧ (∀x y.aspect1 x = aspect1 y ≡ x =y)\cr
∧ (∀x y.aspect2 x = aspect2 y ≡ x =y)\cr
∧ (∀x y.aspect3 x = aspect3 y ≡ x =y).\cr
}}$$
%
We could include this axiom in $A(ab,flies)$, but as we shall see,
it won't matter whether we do, because it contains neither $ab$ nor
$flies$.
The circumscription formula $A↑\prime(ab,flies)$ is then
%leql{a19}
$$A(ab,flies)
∧ ∀ab↑\prime flies↑\prime.[A(ab↑\prime,flies↑\prime) ∧
[∀x. ab↑\prime x ⊃ ab x] ⊃ [∀x. ab x ≡ ab↑\prime x]],\leql{axix}$$
%
which spelled out becomes
%
%leql{a20}
$$\vcenter{\openup\jot\ialign{&$#$\hfil\cr
\mskip-\thickmuskip [∀x.¬ab aspect1 x ⊃ ¬flies x]\cr
∧ [∀x.bird x ⊃ ab aspect1 x]\hcr
∧ [∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x]\hcr
∧ [∀x.ostrich x ⊃ ab aspect2 x]\hcr
∧ [∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x]\hcr
∧ ∀ab↑\prime flies↑\prime.&[[∀x.¬ab↑\prime aspect1 x ⊃ ¬flies↑\prime x]\cr
&
∧ [∀x.bird x ⊃ ab↑\prime aspect1 x]\cr
&
∧ [∀x.bird x ∧ ¬ab↑\prime aspect2 x ⊃ flies↑\prime x]\cr
&
∧ [∀x.ostrich x ⊃ ab↑\prime aspect2 x]\cr
&
∧ [∀x.ostrich x ∧ ¬ab↑\prime aspect3 x ⊃ ¬flies↑\prime x]\cr
&
∧ [∀z.ab↑\prime z ⊃ ab z]\cr
⊃ [∀z.ab z ≡ ab↑\prime z]].\hcr
}}
\leql{axx}$$
$A(ab,flies)$ is guaranteed to be true, because it is part of what
is assumed in our common sense data base. Therefore (\eqref{axx}) reduces
to
%
%leql{a20a}
$$\vcenter{\openup\jot\ialign{$#$\hfil\cr
\mskip-\thickmuskip ∀ab↑\prime flies↑\prime.[[∀x.¬ab↑\prime aspect1 x ⊃ ¬flies↑\prime x]\cr
∧ [∀x.bird x ⊃ ab↑\prime aspect1 x]\cr
∧ [∀x.bird x ∧ ¬ab↑\prime aspect2 x ⊃ flies↑\prime x]\cr
∧ [∀x.ostrich x ⊃ ab↑\prime aspect2 x]\cr
∧ [∀x.ostrich x ∧ ¬ab↑\prime aspect3 x ⊃ ¬flies↑\prime x]\cr
∧ [∀z.ab↑\prime z ⊃ ab z]\cr
⊃ [∀z.ab z ≡ ab↑\prime z]].\cr
}}\leql{axxa}$$
Our objective is now to make suitable substitutions for $ab↑\prime$ and
$flies↑\prime$ so that all the terms preceding the $⊃$ in
(\eqref{axxa}) will be true,
and the right side will determine $ab$. The axiom $A(ab,flies)$ will then
determine $flies$, i.e. we will know what the fliers are.
$flies↑\prime$ is easy, because we need only apply wishful
thinking; we want the fliers to be just those birds that aren't ostriches.
Therefore, we put
%leql{a21}
$$flies↑\prime x ≡ bird x ∧ ¬ostrich x.\leql{axxi}$$
$ab↑\prime$ isn't really much more difficult, but there is a notational
problem. We define
%leql{a22}
$$ab↑\prime z ≡ [∃x.bird x ∧ z = aspect1 x]
∨ [∃x.ostrich x ∧ z = aspect2 x],\leql{axxii}$$
%
which covers the cases we want to be abnormal.
Appendix A contains a complete proof as accepted by Jussi
Ketonen's (1984) interactive theorem prover EKL. EKL uses the theory of
types and therefore has no problem with the second order logic required by
circumscription.
\section{Simple Abnormality Theories}
The examples in this paper all circumscribe the predicate $ab$.
However, they differ in what they take as variable in the circumscription.
The declarative expression of common sense requires that we be
definite about what is circumscribed and what is variable in
the circumscription. We have the following objectives.
1. The general facts of common sense are described
by a collection of sentences that are not oriented in advance
to particular problems.
2. It is prescribed how to express the facts of particular
situations including the goals to be achieved.
3. The general system prescribes what is to be circumscribed
and what is variable.
4. Once the facts to be taken into account are chosen, the
circumscription process proceeds in a definite way resulting in a definite
theory --- in general second order.
5. The conclusions reached taking a given set of facts into
account are intuitively reasonable.
These objectives are the same as those of (McCarthy 1959)
except that that paper used only monotonic reasoning.
The examples of this paper suggest defining a {\it simple
abnormality formalism} used as follows.
1. The general facts include $ab$ and a variety of aspects.
2. The specific facts do not involve $ab$.
3. The circumscription of $ab$ is done with all predicates
variable. This means that the axioms must be sufficient
to tie them down.
I had hoped that the {simple abnormality formalism} would
be adequate to express common sense knowledge. Unfortunately,
this seems not to be the case. Consider the following axioms.
%
$$¬ab aspect1 x ⊃ ¬flies x,$$
%
$$bird x ⊃ ab aspect1 x,$$
%
$$bird x ∧ ¬ab aspect2 x ⊃ flies x,$$
%
%$$canary x ∧ ¬ab aspect3 x ⊃ flies x,$$ [This wrong line fixed 1986 Feb 3]
$$canary x ∧ ¬ab aspect3 x ⊃ bird x,$$
%
$$canary Tweety.$$
%
We ask whether Tweety flies. Simply circumscribing $ab$ leaves this
undecided, because Tweety can either be abnormal in $aspect1$ or
in $aspect3$. Common sense tells us that we should conclude that
Tweety flies. This can be achieved by preferring to have Tweety
abnormal in $aspect1$ to having Tweety abnormal in $aspect3$. It
is not yet clear whether this can be done using the {\it simple
circumscriptive formalism}. Our approach to solving this problem is
discussed in the following section on prioritized circumscription.
However, simple abnormality theories may be adequate for
an interesting set of common sense axiomatizations.
\
section{Priorirized Circumscription}
An alternate way of introducing formula circumscription is by
means of an ordering on tuples of predicates satisfying an axiom. We
define $P ≤ P↑\prime$ by
%leql{k10}
$$∀P P↑\prime.P ≤ P↑\prime ≡ ∀x.E(P,x) ⊃ E(P↑\prime,x).\leql{kx}$$
%
That $P0$ is a relative minimum in this ordering is expressed by
%leql{k11}
$$∀P.P ≤ P0 ⊃ P = P0,\leql{kxi}$$
%
where equality is interpreted extensionally, i.e. we have
%leql{k12}
$$∀P P↑\prime.P = P↑\prime ≡ (∀x.E(P,x) ≡ E(P↑\prime,x)).\leql{kxii}$$
%
Assuming that we look for a minimum among predicates $P$
satisfying $A(P)$, then (\eqref{kx}) expands to precisely to the
circumscription formula (\eqref{ai}). In some earlier expositions of
circumscription this ordering approach was used, and Vladimir Lifschitz in
a recent seminar advocated returning to it as a more fundamental and
understandable concept.
I'm beginning to think he's right about it being more
understandable, and there seems to be a more fundamental reason for using
it. Namely, certain common sense axiomatizations are easier to formalize
if we use a new kind of ordering, and circumscription based on this kind
of ordering doesn't seem to reduce to ordinary formula circumscription.
We call it {\it prioritized circumscription.}
Suppose we write some bird axioms in the form
%leql{k1}
$$∀x.¬ab aspect1 x ⊃ ¬flies x\leql{ki}$$
%
and
%leql{k2}
$$∀x.bird x ∧ ¬ab aspect2 x ⊃ ab aspect1 x.\leql{kii}$$
The intent is clear. The goal is that being a bird and not
abnormal in $aspect2$ prevents the application of (\eqref{ki}). However,
circumscribing $ab z$ with the conjunction of (\eqref{ki}) and
(\eqref{kii}) as $A(ab)$ doesn't have this effect, because (\eqref{kii})
is equivalent to
%leql{k3}
$$∀x.bird x ⊃ ab aspect1 x ∨ ab aspect2 x,\leql{kiii}$$
%
and there is no indication that one would prefer to have $aspect1 x$
abnormal rather than to have $aspect2 x$ abnormal. Circumscription then results
in a disjunction which is not wanted in this case. The need to avoid this
disjunction is why the axioms in section \whether\ included cancellation
of inheritance axioms.
However, by using a new kind of ordering we can leave (\eqref{ki})
and (\eqref{kii}) as is, and still get the desired effect.
We define two orderings on $ab$ predicates, namely
%leql{k4}
$$∀ab ab↑\prime.ab ≤↓1 ab↑\prime ≡ ∀x.ab aspect1 x ⊃ ab↑\prime aspect1
x\leql{kiv}$$
%
and
%leql{k5}
$$∀ab ab↑\prime.ab ≤↓2 ab↑\prime ≡ ∀x.ab aspect2 x ⊃
ab↑\prime aspect2 x.\leql{kv}$$
%
We then combine these orderings
lexicographically giving $≤↓2$ priority over $≤↓1$ getting
%leql{k6}
$$∀ab ab↑\prime.ab ≤↓{1<2} ab↑\prime ≡ ab ≤↓2 ab↑\prime
∧ ab =↓2 ab↑\prime ⊃ ab ≤↓1 ab↑\prime.\leql{kvi}$$
%
Choosing $ab0$ so as to minimize this ordering yields the result
that exactly birds can fly. However, if we add
%leql{k7}
$$∀x.ostrich x ⊃ ab aspect2 x,\leql{kvii}$$
%
we'll get
that ostriches (whether or not ostriches are birds) don't fly without
further axioms. If we use
%leql{k8}
$$∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ab aspect2 x\leql{kviii}$$
%
instead of (\eqref{kvii}), we'll have to revise our
notion of ordering to put minimizing $ab aspect3 x$ at higher priority
than minimizing $aspect2 x$ and $a fortiori$ at higher priority than
minimizing $aspect1.$
This suggests providing a partial ordering on aspects giving their
priorities and providing axioms that permit deducing the ordering on $ab$
from the sentences that describe the ordering relations.
Lifschitz (1985) further develops the idea of prioritized circumscription.
I expect that $prioritized circumscription$ will turn out to
be the most natural and powerful variant.
Simple abnormality also theories seem to be inadequate also
for the blocks world described in section 11. I am indebted to
Lifschitz for the following example. Consider
%
$$S2 = result(move(B,top A),result(move(A, top B),S0)),$$
%
where $S0$ is a situation with exactly blocks $A$ and $B$ on the table.
Intuitively, the second action $move(B, top A)$ is unsuccessful,
because after the first action $A$ is on $B$, and so $B$ isn't
clear. Suppose we provide by a suitable axiom that when the block
to be moved is not clear or the destination place is not clear,
then the situation is normally unchanged. Then $S2$ should be the same
situation as $S1 = result(move(A,B),S0)$. However, simple circumscription
of $ab$ won't give this result, because the first move is only
normally successful, and if the first move is unsuccessful for
some unspecified reason, the second move may succeed after all.
Therefore, circumscription of $ab$ only gives a disjunction.
Clearly the priorities need to be arranged to avoid this
kind of unintended ``sneak disjunction''. The best way to do it
by imposing priorities isn't clear at the time of this writing.
\section{General Considerations and Remarks}
1. Suppose we have a data base of facts axiomatized by a formalism
involving the predicate $ab$. In connection with a particular problem, a
program takes a subcollection of these facts together with the specific
facts of the problem and then circumscribes $ab z$. We get a second order
formula, and in general, as the natural number example of (McCarthy 1980)
shows, this formula is not equivalent to any first order formula.
However, many common sense domains are axiomatizable in such a way that
the circumscription is equivalent to a first order formula. In this case
we call the circumscription collapsible. For example, Vladimir Lifschitz
(1985) has shown that this is true if the axioms are from a certain class
he calls ``separable'' formulas. This can presumably be
extended to other cases in which the ranges and domains of the functions
are disjoint, so that there is no way of generating an infinity of
elements.
Circumscription is also collapsible when the predicates are
all monadic and there are no functions.
2. We can then regard the process of deciding what facts to take
into account and then circumscribing as a process of compiling from
a slightly higher level nonmonotonic language into mathematical
logic, especially first order logic. We can also regard natural
language as higher level than logic. However, as I shall discuss
elsewhere, natural language doesn't have an independent reasoning
process, because most natural language inferences involve suppressed
premisses which are not represented in natural language in the minds
of the people doing the reasoning.
Reiter has pointed out, both informally and implicitly in
(Reiter 1982) that circumscription often translates directly into
Prolog program once it has been decided what facts to take into
account.
3. Circumscription has interesting relations to Reiter's (1980)
logic of defaults. In simple cases they give the same results.
However, a computer program using default logic would have to establish
the existence of models, perhaps by constructing them, in order to
determine that the sentences mentioned in default rules were consistent.
Such computations are not just selectively applying the rules of
inference of logic but are metamathematical. At present this is
treated entirely informally, and I am not aware of any computer
program that finds models of sets of sentences or even
interacts with a user to find and verify such models.
Circumscription works entirely within logic as Appendices A
and B illustrate. It can do this, because it uses second order logic to
import some of the model theory of first order formulas into the theory
itself. Finding the right substitution for the predicate variables is, in
the cases we have examined, the same task as finding models of a first
order theory. Putting everything into the logic itself is
an advantage as long as there is neither a good theory of how to construct
models nor programs that do it.
Notice, however, that finding an interpretation of a language has
two parts --- finding a domain and interpreting the predicate and
function letters by predicates and functions on the domain. It
seems that the second is easier to import into second order logic
than the first. This may be why our treatment of unique names is awkward.
4. We are only part way to our goal of providing a formalism
in which a database of common sense knowledge can be expressed.
Besides sets of axioms involving $ab$, we need ways of specifying
what facts shall be taken into account and what functions and
predicates are to be taken as variable.
Moreover, some of the
circumscriptions have unwanted conclusions, e.g. that there are no
ostriches if none are explicitly mentioned. Perhaps some of this
can be fixed by introducing the notion of present situation. An
axiom that ostriches exist will do no harm if what is allowed to
vary includes only ostriches that are present.
5. Nonmonotonic formalisms in general, and circumscription in
particular, have many as yet unrealized applications to formalizing
common sense knowledge and reasoning. Since we have to think about
these matters in a new way, what the applications are and how to
realize them isn't immediately obvious. Here are some suggestions.
When we are searching for the ``best'' object of some kind,
we often jump to the conclusion that the best we have found so far
is the best. This process can be represented as circumscribing
$better(x,candidate)$, where $candidate$ is the best we have found
so far. If we attempt this circumscription while including
certain information in our axiom $A(better,P)$, where $P$ represents
additional predicates being varied, we will succeed in showing
that there is nothing better only if this is consistent with
the information we take into account. If the attempt to circumscribe
fails, we would like our reasoning program to use the failure
as an aid to finding a better object. I don't know how hard
this would be.
%Add a section about the virtues of reification including how
%reifying goals permits $achievable(s,g) ⊃ ab aspect77(s,g)$.
\noindent{\bf Acknowledgments}
\bigskip
\nobreak
I have had useful discussions with Matthew Ginsberg,
Benjamin Grosof,\break Vladimir Lifschitz and Leslie Pack.
The work was partially
supported by NSF and by DARPA. I also thank Jussi Ketonen
for developing EKL and helping me with its use. In particular
he greatly shortened the unique names proof.
\bigbreak
}
\centerline{\bf Appendix D. Formal Theories of Action}
\bigskip
\centerline{Vladimir Lifschitz}
\vskip .4 in
\settabs 10 \columns
ABSTRACT. We apply circumscription to
formalizing reasoning about the effects of actions
in the framework of the situation calculus. An axiomatic description
of causal connections between actions and changes allows us to
solve the qualification problem and the frame problem using only simple
forms of circumscription. The method is applied to the Hanks---McDermott
shooting problem and to a blocks world in which blocks can be moved and painted.
\bigskip
\noindent{\bf 1. Introduction}
\bigskip
We consider the problem of formalizing reasoning about action in the framework of
the {\sl situation calculus} of McCarthy and Hayes (1969).
A {\sl situation} is the complete
state of the universe at an instant of time. A {\sl fluent} is a function defined
on situations. For instance, in the blocks world, the location of a given block
$x$ is a fluent whose values are the possible locations of blocks. In the
language of the
situation calculus, the value of this fluent at $s$ for a block $x$ is denoted by
$location(x,s)$. If $s$ is a situation and $a$ an {\sl action}
then $result(a,s)$ stands for the situation that results when the action $a$
is carried out starting in the situation $s$. Here is a typical
situation calculus formula:
%
$$location(A,result(move(A,Table),S0))=Table.$$
%
Formulas of the situation calculus may also contain variables and logical symbols
and thus can express general facts about the effects of actions.
We are primarily interested in the form of reasoning
which Hanks and McDermott (1986) call ``temporal projection'':
given some description of the current situation, some description of the
effects of possible actions, and a sequence of actions to be performed,
predict the properties of the world in the resulting situation.
A number of difficulties arise in attempts to
describe the effects of actions in the situation calculus and in similar
formalisms. One of them, known as the {\sl qualification
problem}, is related to the fact that the successful
performance of an action may depend on a large number of qualifications. As a
consequence, the axiom describing the effect of such an action has to include a
long list of assumptions. It is desirable to treat each qualification as a
separate fact and describe it by a separate axiom.
Another difficulty is that, in addition to the
axioms describing how the properties of the current situation change when an action
is performed, we need a large number of axioms listing the properties which are
{\sl not} affected by a given action. This is called the {\sl frame problem}.
For instance, when a robot is moving a block, appropriate ``frame axioms'' must
guarantee that all the properties of the situation which are not related to the
positions of the robot and the block (for instance, the positions of all other
objects) remain the same.
John McCarthy proposed approaching the qualification problem and the frame
problem using {\sl circumscription} (McCarthy 1980, 1986). We
assume that the reader has some familiarity with this concept.
(For the definition, see (McCarthy 1986) or (Lifschitz 1985). Here we apply
circumscription to many-sorted languages; this generalization is straightorward.)
With regard to the
qualification problem, the idea is, instead of listing all the qualifications in the
antecedent of one axiom, to condition the axiom on the fact
that (a certain aspect of) the
action is not ``abnormal'' in the
situation in which the action is performed. For instance,
the effect of moving a block $x$ to a location $l$ is described in
(McCarthy 1986) by
%
$$¬ ab\;aspect3(x,l,s)⊃location(x,result(move(x,l),s))=l.$$
%
For each qualifying condition,
a separate axiom postulates that the condition implies the
abnormality of this aspect of the action, for instance:
%
$$¬clear(top\;x,s) ⊃ ab\;aspect3(x,l,s).$$
%
By circumscribing $ab$, we ``minimize abnormality'', which should ensure that
the action has its expected effect whenever we are not in any of those exceptional
situations which, according to the axioms, make the action abnormal.
For the frame problem, similarly, McCarthy proposes an axiom which guarantees
that the location of $x$ in the situation $result(a,s)$ is the same as its location
in the situation $s$ unless a certain aspect of $a$ is abnormal. Then
minimizing $ab$ should allow us to prove that $x$ does not change its position
if the action consists of moving a block other than $x$, or in painting a block,
etc., so that all ``frame axioms'' should become provable.
Further analysis has shown, however, that the result of minimizing abnormality
relative to McCarthy's axiom set is not quite as strong as necessary for the
successful formalization of the blocks world. The difficulty can be illustrated by
the following example (McCarthy 1986). Consider the
situations
%
$$S1=result(move(A,top\;B),S0)$$
%
and
%
$$S2=result(move(B,top\;A),S1),$$
%
where $S0$ is a situation with exactly blocks $A$ and $B$ on the table.
We expect circumscription to guarantee that the first action, $move(A,top\;B)$,
is ``normal'', so that $A$ is on top of $B$ in situation $S1$. In view of
that, the second action is ``abnormal'' (so that the positions of the blocks in
situation $S2$ will be impossible to predict on the basis of McCarthy's
axioms). However, if the first move is ``abnormal'' (for some unspecified
reason), the positions of the blocks after the first action may remain unchanged,
so that the second move may succeed after all. This observation shows that
the intended model of McCarthy's axioms is not the only one in which the extension
of $ab$ is minimal. In the intended model, the first action is normal, and the
second is not; in the other model, it is the other way around. Using
circumscription, we can only prove the common properties of all minimal models;
minimizing $ab$ will only give a disjunction.
Another example of such an ``overweak disjunction'',
for an even simpler system of actions, is discussed in
(Hanks and McDermott 1986). (See also (Hanks and McDermott 1985) and
(McDermott 1986), where formulations not based on the situation calculus are
given.) There are three possible actions: $load$ (a gun), $wait$, and $shoot$ (an
individual, whose name, according to some accounts, is Fred).
Normally, waiting does not cause any changes in the world, but
shooting leads to Fred's death, provided, of course, that the gun is loaded.
Assume that all three actions are performed, in the given order.
An axiom set similar to McCarthy's axioms for the blocks world is shown to have
an unintended minimal model, in which the gun mysteriously gets unloaded on the
waiting stage, so that the shooting does not kill Fred. In the
intended model, $wait$ does not change the world, but $shoot$ does; in the
unintended model, it is the other way around. In either case, $ab$ is minimal.
Here again, the result of circumscription is too weak.
It is easy to notice that, in both examples, there
is a conflict between minimizing abnormality at two instants of time.
In the intended model, minimization
at the earlier instant is preferred. But circumscription knows nothing about time;
it interprets minimality of a predicate as the minimality of its extension
relative to set inclusion. This, according to Hanks and McDermott, is the
reason why circumscription does not live up to our expectations in application
to temporal reasoning:
``$\dots$ the class of models we want our logic to select is not the ``minimal
models'' in the set-inclusion sense of circumscription, but the
``chronologically minimal'' (a term due to Yoav Shoham): those in which
normality assumptions are made in chronological order, from earliest to latest,
or, equivalently, those in which abnormality occurs as late as possible''
(Hanks and McDermott 1986).
The paper by Hanks and McDermott was presented at the AAAI conference in 1986, and
three more talks given at the same conference,
(Kautz 1986), (Lifschitz 1986) and (Shoham 1986), met their challenge
by formalizing, in somewhat different contexts, the idea of chronological
minimization. As could be
expected, all of them had to use forms of logical minimization other than
circumscription in the sense of (McCarthy 1986).
In this paper we argue that formalizing chronological minimization is not the
easiest way to solve the temporal projection problem. We start with
another view on the difference between the intended and the unintended
models of the axioms for actions: in the intended model, {\sl all changes in the
values of fluents are caused by actions}, whereas in other models this is
not the case. To express this distinction formally, we add new primitive
predicates to the language. With this
different choice of primitive concepts, the effects of actions can
be characterized by simple axioms in the language of the situation calculus
plus traditional circumscription. The axiomatic treatment of causality is the
main distinctive feature of our approach.
The method is illustrated by formalizing two examples: the shooting problem
and the blocks world. All fluents dealt with in these examples are truth-valued,
but the method can be easily extended to fluents of other kinds.
Metamathematical theorems are presented demonstrating that
the formalizations are consistent and reasonably complete. We also show that
the result of circumscription in these applications can be determined by simple
syntactic methods, such as predicate completion. This allows us, in each case,
to prove the correctness of a natural procedure for determining the effect of
a given sequence of actions.
\bigskip
\bigskip
\noindent{\bf 2. The Yale Shooting Problem}
\bigskip
To formalize the shooting story from the Hanks---McDermott papers,
we use object variables of four sorts: for truth values ($v$), for actions ($a$),
for situations ($s$), and for truth-valued fluents ($f$). The object constants
are: truth values $false$ and $true$, actions $load$, $wait$ and $shoot$,
situation $S0$ and fluents $loaded$ and $alive$.
The binary situation-valued
function $result$ must have an action term and a situation term as arguments.
Finally, there are three predicate constants:
$holds(f,s)$ expresses that $f$ is true in the situation $s$;
$causes(a,f,v)$ expresses that $a$ causes the fluent $f$ to take on the value $v$;
$precond(f,a)$ expresses that $f$ is a precondition for the successful
execution of $a$.\footnote{$↑*$}{Notice that, syntactically, the variables
for actions are {\sl object} variables, not function variables, even though
each action $a$ has a function associated with it: the situational fluent
$s \mapsto result(a,s)$. Similarly, the variables for truth-valued fluents are
object variables, not predicate variables. The value of $f$ at $s$ is
represented by $holds(f,s)$.}
Our axioms for the shooting problem can be classified into three groups. The first
group
describes the initial situation:
%
$$holds(alive,S0),\eqno(Y1.1)$$
$$¬ holds(loaded,S0).\eqno(Y1.2)$$
%
The second group tells us how the fluents are affected by actions:
%
$$causes(load,loaded,true),\eqno(Y2.1)$$
$$causes(shoot,loaded,false),\eqno(Y2.2)$$
$$causes(shoot,alive,false).\eqno(Y2.3)$$
%
These axioms describe the effects of
{\sl successfully performed} actions;
they do not say {\sl when} an action can be successful.
This information is supplied separately:
%
$$precond(loaded,shoot).\eqno(Y2.4)$$
%
The last group consists of two axioms of a more general nature. We use
the abbreviations:
%
$$success(a,s) ≡ ∀f(precond(f,a) ⊃ holds(f,s)),$$
$$affects(a,f,s) ≡ success(a,s) ∧ ∃v\;causes(a,f,v).$$
%
One axiom describes how the value of a fluent changes after an action affecting
this fluent is carried out:
%
$$success(a,s) ∧ causes(a,f,v) ⊃ (holds(f,result(a,s)) ≡ v=true).\eqno(Y3.1)$$
%
(Recall that $v$ can take on two values here, $true$ and $false$; the
equivalence in $Y3.1$ reduces to $holds(f,result(a,s))$ in the first case and
to the negation of this formula in the second.)
If the fluent is not affected then its value remains the same:
%
$$¬affects(a,f,s) ⊃ (holds(f,result(a,s)) ≡ holds(f,s)).\eqno(Y3.2)$$
This axiom set obviously does not include a number of assumptions that may be
essential. The axioms do not
tell us, for instance, whether $false$ and $true$ are different from
each other and whether there are any truth values other than these two; we
do not know whether $load$, $shoot$ and $wait$ are three different actions, etc.
In this preliminary discussion, instead of making all these assumptions
explicit, we limit our attention to
the {\sl term models} of the axioms,
in which every element of the universe is
represented by a ground term of the language, and different terms represent
different objects. (Such models are also called {\sl Herbrand models} in case of
universal theories).
We will identify the universe of a term model with
the set of ground terms.
It remains to specify how circumscription is used. We circumscribe $causes$ and
$precond$, with the remaining predicate $holds$ allowed to vary, relative to the
conjunction $Y$ of the axioms $Y1.1$---$Y3.2$.
This will lead us to
the conclusion that $causes$ and $precond$ are true only when this is required by
the axioms of the second group, $Y2$.
The minimization of $causes$ will imply, in view of axiom
$Y3.2$, that the only changes taking place in the world are those corresponding to
axioms $Y2.1$---$Y2.3$. This solves the frame problem.
The minimization of $precond$,
in view of the definition of $success$, will imply that the only unsuccessful
actions are those in which precondition $Y2.4$ is violated.
This solves the qualification
problem.
To illustrate this last point, assume for a moment that we would like to make the
problem one step closer to the reality of assassination attempts as we know them in
our everyday life and take into account that one has to have bullets available in
order to load a gun. To introduce this qualification, we would simply
add the axiom
$$precond(bulletsavailable,load)$$
to group $Y2$, where $bulletsavailable$ is a new fluent. (We may also wish to
introduce an initial condition for this fluent and actions affecting its
value, such as $stealbullets$, $losebullets$, etc.)
We will show in the next section that there is exactly one term model of
the circumscription described,
and that in this model the formula
%
$$holds(alive,result(shoot,result(wait,result(load,S0))))$$
%
is false.
The main technical difference between this formulation and other
attempts to apply circumscription to the situation calculus is that the
predicates which we circumscribe,
$causes$ and $precond$, are {\sl not fluents}, they
do not have situation arguments. The predicate $success$, which does have a
situation argument, is {\sl defined} in terms of the minimized primitive
$precond$. Since the minimized predicates have no situation arguments, the
conflict between minimizing in earlier and later instants of time
simply cannot arise.
\bigskip
\bigskip
\noindent{\bf 3. The Minimal Model of $Y$}
\bigskip
In this section, by a {\sl model} we understand a term model of $Y$; a model is
{\sl minimal} if it satisfies the result of circumscribing $causes$ and
$precond$ in $Y$ with $holds$ allowed to vary.
The universe of any model consists of the ground terms of the
language of $Y$, which include:
\item{(i)}
truth values $false$ and $true$,
\item{(ii)}
actions $load$, $wait$ and $shoot$,
\item{(iii)}
fluents $loaded$ and $alive$,
\item{(iv)}
infinitely many situations, which can be visualized as the nodes of
a finitely branching tree:
\vfil\eject
\+ & &&&$result(load,result(load,S0))$ &&&&&$\dots$\cr
\+ &$result(load,S0)$ &&&$result(wait,result(load,S0))$ &&&&&$\dots$\cr
\+ & &&&$result(shoot,result(load,S0))$ &&&&&$\dots$\cr
\medskip
\+ & &&&$result(load,result(wait,S0))$ &&&&&$\dots$\cr
\+$S0$ &$result(wait,S0)$ &&&$result(wait,result(wait,S0))$ &&&&&$\dots$\cr
\+ & &&&$result(shoot,result(wait,S0))$ &&&&&$\dots$\cr
\medskip
\+ & &&&$result(load,result(shoot,S0))$ &&&&&$\dots$\cr
\+ &$result(shoot,S0)$ &&&$result(wait,result(shoot,S0))$ &&&&&$\dots$\cr
\+ & &&&$result(shoot,result(shoot,S0))$ &&&&&$\dots$\cr
\bigskip\noindent
We use the letters {\bf v}, {\bf a}, {\bf f}, {\bf s}
as metavariables for ground terms of types (i)---(iv), respectively.
A model is defined by interpreting the predicate constants $causes$, $precond$
and $holds$ on this universe, or, equivalently, by assigning a truth value to
each ground atom.
Consider the Boolean function ${\cal T}$ defined on the set of
ground atoms by the following recursive equations:
$$\eqalignno{
{\cal T}[causes({\bf a},{\bf f},{\bf v})]
=\quad&({\bf a}=load∧{\bf f}=loaded∧{\bf v}=true)\cr
∨&({\bf a}=shoot∧{\bf f}=loaded∧{\bf v}=false)&(1)\cr
∨&({\bf a}=shoot∧{\bf f}=alive∧{\bf v}=false),\cr
{\cal T}[precond({\bf f},{\bf a})]=\quad&{\bf f}=loaded∧{\bf a}=shoot,&(2)\cr
{\cal T}[holds(loaded,S0)]=\quad&false,&(3)\cr
{\cal T}[holds(alive,S0)]=\quad&true,&(4)\cr
{\cal T}[holds({\bf f},result({\bf a},{\bf s}))]=\quad&
\cases{{\bf v},&if ${\cal S}[{\bf a},{\bf s}]
∧{\cal T}[causes({\bf a},{\bf f},{\bf v})]$,\cr
{\cal T}[holds({\bf f},{\bf s})],&otherwise,\cr}&(5)\cr
}$$
where
$${\cal S}[{\bf a},{\bf s}]=\bigwedge↓{{\bf f} : {\cal T}
[precond({\bf f}, {\bf a})]}{\cal T}[holds({\bf f},{\bf s})].\eqno(6)$$
The correctness of clause (5)
follows from the fact that $causes({\bf a},{\bf f},{\bf v})$
is never satisfied for more than one value of {\bf v}.
Define $M↓0$ as the model in which every ground atom {\bf A} has the truth
value ${\cal T}[{\bf A}]$. It is clear from the definitions of $success$
and $\cal S$ that ${\cal S}[{\bf a}, {\bf s}]=true$ if and only if the
formula $success({\bf a}, {\bf s})$ is true in $M↓0$. It is easy to check using
this fact that each axiom of $Y$ is indeed true in $M↓0$.
Furthermore, axioms $Y2$ show that the extension of each of the predicates
$causes$ and $precond$ in any
model is a superset of its extension in $M↓0$. This means that $M↓0$ is minimal,
and in any other minimal model $causes$ and $precond$ are interpreted in the
same way as in $M↓0$. It remains to notice that, in view of axioms $Y1$ and $Y3$,
equations (3)---(5), which define ${\cal T}$ on atoms of the form $holds({\bf f},
{\bf s})$, are satisfied for any function which computes the truth values of
these atoms in some model of $Y$;
consequently, $holds$ also has to be interpreted in any minimal model as in $M↓0$.
We have proved that $M↓0$ is the only minimal model.
Equations (1)---(5) give an
algorithm for evaluating any ground atom in the model $M↓0$.
According to (1) and (2), the predicates $causes$ and $precond$ can be
described in $M↓0$ by these explicit definitions:
$$
\eqalign{
causes(a, p, f)≡\quad&( a=load∧ p=loaded∧ f=true)\cr
∨&( a=shoot∧ p=loaded∧ f=false)\cr
∨&( a=shoot∧ p=alive∧ f=false),\cr
precond( f, a)≡\quad&( f=loaded∧ a=shoot).\cr
}\eqno(7)
$$
These definitions can be obtained by applying the
``predicate completion'' procedure (Clark 1978)
to axioms $Y2$. They can be also characterized
as the result of circumscribing the predicates
$causes$ and $precond$ relative to $Y2$.
These predicates occur both in $Y2$ and in $Y3$, but we see that
axioms $Y3$ can be ignored when the result of circumscription is determined.
Equations (3)---(5) can be interpreted as the description of a deterministic
finite automaton $\cal A$ with 4 internal states, corresponding to all possible
combinations of values of the fluents $loaded$ and $alive$. The input
symbols of ${\cal A}$ are
$load$, $wait$, $shoot$. Formulas (3) and (4) define
the initial state of $\cal A$ to be $(false,true)$, and (5)
defines its transition function. Strings of input symbols are ``plans'',
and the goal condition of a planning problem would define a set of final states.
For example, the goal $¬alive$ corresponds to selecting
$(false,false)$ and $(true,false)$ as the final states;
the plan $load$, $wait$, $shoot$ is accepted by
${\cal A}$ with the set of final states defined in this way.
Each state of ${\cal A}$ corresponds to a set of situations which are
indistinguishable in terms of fluents (iii). The states of ${\cal A}$ can be
identified with the equivalence classes of the space of situations
relative to the relation
$$\bigwedge↓{\bf f}holds({\bf f},s)≡holds({\bf f},s'),$$
where {\bf f} ranges over the set of fluent constants $\{loaded,alive\}$.
These fluents form a ``coordinate frame'' in the quotient set of
the space of situations relative to this equivalence relation.
\bigskip
\bigskip
\noindent{\bf 4. An Alternative Formulation}
\bigskip
The formalization of the Yale shooting problem proposed above has two defects.
First, we were only able to show that actions lead to the expected results in
{\sl term models} of the circumscription,
not in arbitrary models. Second, we exploited
some special features of the problem that are not present in more complex
problems of this kind. Now we will consider an improved, though somewhat less
economical, formalization of the shooting story. Here are the main differences
between the two versions.
1. Since it is essential in the first solution that, in a term model, different
ground terms have different values, we will need some {\sl uniqueness of names}
axioms, expressing this property.
The following notation will be useful: If $c↓1,\dots,c↓n$ are
constants of the same type then $U[c↓1,\dots,c↓n]$ stands for the set of
axioms $c↓i≠c↓j$ $(1≤i<j≤n)$. For instance, $U[load,wait,shoot]$ is
$$load≠wait,\;load≠shoot,\;wait≠shoot.$$
2. It is also essential in the first solution that every situation in a term
model is ``in the future'' of the situation $S0$, i.e., is
the result of executing a sequence of actions in $S0$.\footnote{$↑*$}{This
fact was used to define the interpretation of $holds$ in $M↓0$
by recursion on the
second argument.} We see several ways to deal with this problem, all
of them, admittedly, somewhat unintuitive. In the new solution we choose
to restrict axioms $Y3$ to a subclass of
{\sl relevant} situations; we will postulate that $S0$ is relevant and that
the result of executing an action in a relevant situation is relevant also.
3. The only precondition in the Yale shooting problem (given by axiom $Y2.4$)
is extremely simple: $loaded$ is one of the ``coordinate fluents'' defining the
state of the corresponding automaton (see the end of Section 3). Even in the
blocks world, we need preconditions of a more complex nature, such as
$clear\;l$ and $clear\;top\;b$
for the action $move(b,l)$, where
$clear$ is explicitly defined in terms of primitive fluents.
Accordingly, in the second solution we explicitly distinguish between
fluents in the general sense and primitive fluents. Only a
primitive fluent can occur as the second argument of $causes$,
but preconditions do not have to be primitive.
4. The value which $a$ causes $f$ to take on may depend on the situation in
which $a$ is executed. For instance, toggling a switch may cause its status to
change in two different ways, depending on its current
status.\footnote{$↑{**}$}{This generalization
was suggested by Michael Georgeff.} Similarly, an
assignment statement which has variables in the right-hand side causes the value
of the left-hand side to change in different ways depending on the current state
of computation. Accordingly, we will allow a truth-valued fluent (not necessarily
primitive), rather than a truth value, to be the last argument of $causes$; the
value of that fluent in the situation $s$ is the value that $f$ will take on
in the situation $result(a,s)$, provided $a$ is successful. The effect of
toggling a switch can be described then by an axiom such as
$$causes(toggle,\,on,\,not\;on)$$
($not$ is the function which transforms a fluent into its negation),
and the effect of an assignment by
$$causes(var:=exp,\,var,\,exp).$$
In the new formalization of the Yale shooting
we have an additional sort of variables, primitive
fluents. On the other hand, we do not need variables for truth values
any more. Thus the language has variables for actions ($a$), for situations ($s$),
for truth-valued fluents ($f$), and for primitive truth-valued fluents ($p$).
This last sort is considered a subtype of fluents, so that a term
representing a primitive fluent can be used wherever the syntax requires
a fluent. (Accordingly, in a structure for the language the domain of
primitive fluents must be a subset of the domain of fluents).
The object constants are the same as in
the language of Section 2. As before, $load$, $wait$
and $shoot$ are action constants and $S0$ a
situation constant; $loaded$ and $alive$ are primitive fluents and
$false$ and $true$ are fluents (intuitively, the identically false fluent
and the identically true fluent). The only function constant, $result$,
and the predicate constants $holds$ and $precond$ are used
as in the first formulation. The arguments of $causes$ must be an action, a
primitive fluent and a fluent. In addition, we need the unary predicate
$relevant$, whose argument must be a situation term.
%The axiom set contains several postulates which
%have no counterparts in the first formulation. First, we need these
%uniqueness of names axioms:
%
The axioms of the first two groups, $Y1$ and $Y2$ (Section 2) are included
in the new formulation without
change. (It is easy to see that all of them are well-formed formulas of the
new language.) The
definitions of $success$ and $affects$ remain the same, except that variables
of different sorts are required now in the definition of $affects$:
$$affects(a,p,s) ≡ success(a,s) ∧ ∃f\;causes(a,p,f).$$
The counterpart of $Y3.1$ is the following {\sl Law of Change:}
%
$$\eqalign{
relevant\;s ∧ success(a,s) &∧ causes(a,p,f)\cr
&⊃ (holds(p,result(a,s)) ≡ holds(f,s)).
}\eqno(LC)$$
%
$Y3.2$ becomes the {\sl Commonsense Law of Inertia:}
$$relevant\;s ∧ ¬affects(a,p,s) ⊃ (holds(p,result(a,s)) ≡ holds(p,s)).\eqno(LI)$$
In addition, we need a few axioms which have no counterparts in the first
version. Let $Act$ be the list of actions $load$, $wait$, $shoot$, and let $Fl$
be the list of fluents
$$loaded,alive,false,true.$$
We need the following uniqueness of names axioms:
%
$$U[Act],\eqno(U1)$$
$$U[Fl],\eqno(U2)$$
$$result(a1,s1)=result(a2,s2)⊃a1=a2∧s1=s2.\eqno(U3)$$
%
The formula $loaded≠alive$, for instance, one of the axioms $U2$,
means that
there exists a situation in which the
fluents $loaded$ and $alive$ have different values. Alternatively,
we can accept the ``intensional'' view and think
that fluent variables range over the ``names'' of fluents. (Then, of course,
it is not true that $∀s(holds(f1,s)≡holds(f2,s))$ implies $f1=f2$, and indeed
we have no such axiom.) The same applies to other axioms in $U2$.
Axiom $U3$ represents the assumption that a situation includes a
complete description of its history, i.e., uniquely defines what action has
led to it and in what situation the action was performed.
We also need to assume that the non-primitive fluent constants
$$false, true$$
denote objects which do not belong to the subdomain of primitive fluents.
If this list is denoted by $NPF$ then this assumption can be expressed by
$$p≠{\bf t}\qquad({\bf t}εNPF).\eqno(U4)$$
The next group of axioms consists of the
{\sl fluent definitions} for the non-primitive fluents:
%
$$holds(true,s),\eqno(FD.true)$$
$$¬holds(false,s).\eqno(FD.false)$$
Finally, there are 3 axioms for the new predicate $relevant$:
$$relevant\;S0,\eqno(R1)$$
$$relevant\;s⊃relevant\;result(a,s),\eqno(R2)$$
$$result(a,s)=S0⊃¬relevant\;s.\eqno(R3)$$
Axioms $R1$, $R2$ imply
$$relevant\;{\bf s}\eqno(8)$$
for any situation term {\bf s} without variables.
By $Y'$ we denote the conjunction of the axioms $Y1$, $Y2$, $LC$, $LI$, $U$,
$FD$, $R$.
The predicates $causes$ and $precond$ are circumscribed now
relative to $Y'$, with the
remaining predicates $holds$ and $relevant$ allowed to vary. By a {\sl minimal
model} of $Y'$ we understand any model satisfying this circumscription.
\proclaim Proposition 1. $Y'$ has minimal models.
In fact, $Y'$ has many non-isomorphic minimal models. But in all of them
the predicates $causes$ and $precond$ can be characterized by the same
explicit definitions (essentially identical to their definitions (7)
in the minimal model of $Y$):
\proclaim Proposition 2. Each minimal model of $Y'$ satisfies the conditions
$$\eqalign{
causes(a, p, f)≡\quad&( a=load∧ p=loaded∧ f=true)\cr
∨&( a=shoot∧ p=loaded∧ f=false)\cr
∨&( a=shoot∧ p=alive∧ f=false),\cr
precond( f, a)≡\quad&( f=loaded∧ a=shoot).\cr
}\eqno(7')$$
Let us adapt the definition of ${\cal T}$ (Section 3) to the new language in
the following way:
$$\eqalignno{
{\cal T}[causes({\bf a},{\bf p},{\bf f})]
=&\quad({\bf a}=load∧{\bf p}=loaded∧{\bf f}=true)\cr
&∨({\bf a}=shoot∧{\bf p}=loaded∧{\bf f}=false)&(1')\cr
&∨({\bf a}=shoot∧{\bf p}=alive∧{\bf f}=false),\cr
{\cal T}[precond({\bf f},{\bf a})]=&\quad{\bf f}=loaded∧{\bf a}=shoot,&(2)\cr
{\cal T}[holds(loaded,S0)]=&\quad false,&(3)\cr
{\cal T}[holds(alive,S0)]=&\quad true,&(4)\cr
{\cal T}[holds({\bf p},result({\bf a},{\bf s}))]=&
\cases{{\cal T}[holds({\bf f},{\bf s})],&if ${\cal S}[{\bf a},{\bf s}],
{\cal T}[causes({\bf a},{\bf p},{\bf f})]$,\cr
{\cal T}[holds({\bf p},{\bf s})],&otherwise,\cr}&(8')\cr
{\cal T}[holds(false,{\bf s})]=&\quad false,&(9)\cr
{\cal T}[holds(true,{\bf s})]=&\quad true,&(10)\cr
{\cal T}[relevant\;{\bf s}]=&\quad true.&(11)\cr
}$$
Here {\bf p} is a metavariable for the ground terms representing
primitive fluents; as before, ${\cal S}$ is defined by (6). Notice that equations
(3), (4), ($8'$) define $holds$ for the case when the second argument is primitive,
and (9), (10) for the opposite case.
\proclaim Proposition 3. For any ground atom {\bf A} and any minimal model
$M$ of $Y'$, the truth value of {\bf A} in $M$ is ${\cal T}[{\bf A}]$.
We see that any ground atom is either true in all minimal models of $Y'$
simultaneously, or false in all of them,
so that our ``circumscriptive theory'' is sufficiently strong
to decide any sentence without variables.
Proofs of Propositions 1---3 are given in the appendix.
\bigskip
\bigskip
\noindent{\bf 5. The Blocks World}
\bigskip
Now we are ready to consider a slightly larger example: a blocks world in
which blocks can be moved and painted, as in (McCarthy 1986). Even though
the object domain is now entirely different, the
primitive concepts $result$, $holds$, $causes$ and $precond$ will be used
again, and some
axioms (notably, the law of change and the law of inertia) will be included
in the axiom set for the blocks world in exactly the same form. These axioms
represent the general properties of actions which are likely to be useful
for constructing axiomatic descriptions of many different object domains.
As before, we have variables for
actions ($a$), for situations ($s$), for truth-valued fluents ($f$),
and for primitive truth-valued fluents ($p$), the last sort being again a
subtype of fluents. In addition, there are 3 ``domain-specific'' sorts:
blocks ($b$), locations ($l$) and colors ($c$). Object constants:
$S↓0$, $true$, $false$, blocks $Block↓1$,
$\dots$, $Block↓N$ (where $N$ is a fixed positive integer), location $Table$,
and, finally, colors $Red$, $White$ and $Blue$. Here is the list of
function constants, along with the sorts of their arguments and values:
\medskip
\+&&&$top$ &&$(b→l)$,\cr
\+&&&$at$ &&$(b,l→p)$,\cr
\+&&&$color$ &&$(b,c→p)$,\cr
\+&&&$clear$ &&$(l→f)$,\cr
\+&&&$move$ &&$(b,l→a)$,\cr
\+&&&$paint$ &&$(b,c→a)$,\cr
\+&&&$result$ &&$(a,s→s)$.\cr
\medskip\noindent
The predicate constants are the same as in Section 4: $holds$, $causes$,
$precond$ and $relevant$.
Notice that in this language actions can be represented by ground terms
containing function symbols, such as $move(Block↓1, top\;Block↓2)$ or
$paint(Block↓1,Blue)$, and not merely by object constants, as in the Yale
shooting problem. Still there are only finitely many ground terms representing
actions, so that
only finitely many actions have individual names. The same applies to
primitive fluents. As a result, we will be able to transform the theory of
the blocks world into a description
of this world in terms of a finite automaton, as was done for the
shooting problem in Section 3.
The first group of axioms contains some general facts about blocks, their locations
and colors:
$$b=Block↓1∨\dots∨b=Block↓N,\eqno(B0.1)$$
$$holds(at(b,l1),s)∧holds(at(b,l2),s)⊃l1=l2,\eqno(B0.2)$$
$$holds(color(b,c1),s)∧holds(color(b,c2),s)⊃c1=c2.\eqno(B0.3)$$
Initially, all blocks are white and on the table:
$$holds(at(b,Table),S0),\eqno(B1.1)$$
$$holds(color(b,White),S0).\eqno(B1.2)$$
The effects of actions:
$$causes(move(b,l),at(b,l),true),\eqno(B2.1)$$
$$l≠l1⊃causes(move(b,l),at(b,l1),false),\eqno(B2.2)$$
$$causes(paint(b,c),color(b,c),true),\eqno(B2.3)$$
$$c≠c1⊃causes(paint(b,c),color(b,c1),false),\eqno(B2.4)$$
$$l≠Table⊃precond(clear\;l,move(b,l)),\eqno(B2.5)$$
$$precond(clear\;top\;b,move(b,l)),\eqno(B2.6)$$
$$precond(false,move(b,top\;b)).\eqno(B2.7)$$
(It is impossible to move a block on its own top).
Axioms $B2.2$ and $B2.4$ may seem redundant: we already know from axioms
$B0.2$ and $B0.3$ that a block cannot possibly be at two places or have two
colors simultaneously. But our method requires
that all changes of all primitive fluents caused by an action be described
explicitly. It does not solve the ``ramification problem'' in the sense of
(Finger 1986) and (Ginsberg and Smith 1986).\footnote{$↑*$}{This fact
was pointed out by Matthew Ginsberg.}
The laws of change and inertia $LC$ and $LI$ are formulated as in $Y'$ above.
To state the counterparts of the uniqueness of names axioms from $Y'$,
we define $Act$ and $Fl$ for the blocks world language as follows:
$$Act=(move, paint),$$
$$Fl=(at,color,true,false,clear).$$
%
Now the list $Act$ consists of the symbols for {\sl functions} returning actions,
not of object constants, as in $Y'$. $Fl$ contains both object constants and
function constants.
We will extend the definition of $U[\dots]$ to the case when some or all
of its arguments are functions. We can treat constants as 0-ary
functions and thus define the meaning of $U[f↓1,\dots,f↓n]$, where $f↓1$,
$\dots$, $f↓n$ are functions returning values of the same sort. By definition,
this stands for the following set of axioms:
%
$$f↓ix↓1\dots x↓k≠f↓jy↓1\dots y↓l$$
for all $i<j$, and
$$f↓ix↓1\dots x↓k=f↓iy↓1\dots y↓k⊃(x↓1=y↓1∧\dots x↓k=y↓k),$$
for all $f↓i$ of arity $k>0$, where $x↓1$, $\dots$, $y↓1$, $\dots$ are
variables of appropriate sorts. These axioms express that $f↓1$, $\dots$, $f↓n$ are
injections with disjoint ranges. If each $f↓i$ is an object constant then we
get the special case defined in Section 4 above. Example: Axiom $U3$ can be written
in this notation as $U[result]$.
$NPF$ is defined as the list of terms
$$false,\;true,\;clear\;l.$$
Now axioms $U1$---$U4$ are expressed exactly as in Section 4. In addition, we
have 3 domain-specific uniqueness axioms:
$$U[B↓1,\dots,B↓N],\eqno(U.block)$$
$$U[Table,top],\eqno(U.location)$$
$$U[Red,White,Blue].\eqno(U.color).$$
The fluent definitions include, in addition to $FD.true$ and $FD.false$,
the definition of $clear$:
%
$$holds(clear\;l,s)≡∀b¬holds(at(b,l),s).\eqno(FD.clear)$$
%
Finally, the axioms $R$ are formulated as in $Y'$.
By $B$ we denote the conjunction of all these axioms. A {\sl minimal}
model of $B$ is a model of the circumscription of $causes$ and $precond$
relative to $B$ with $holds$ and $relevant$ allowed to vary.
Properties of these models are similar to properties of
the minimal models of $Y'$: Every ground atom has the same
truth value in all minimal models of $B$, and the values of the atoms
of the form $holds({\bf p},{\bf s})$ can be found by
a recursive process describing the work of a deterministic finite automaton.
More specifically, we can prove the following counterparts of Propositions 1---3.
\proclaim Proposition 4. $B$ has minimal models.
\proclaim Proposition 5. Each minimal model of $B$ satisfies the conditions
$$\eqalign{
causes(a, p, f)≡∃b,l&,l1,c,c1[(a=move(b,l)∧p=at(b,l)∧f=true)\cr
∨&(a=move(b,l)∧move(b,l1)∧l≠l1∧f=false)\cr
∨&(a=paint(b,c)∧p=color(b,c2)∧f=true)\cr
∨&(a=paint(b,c)∧move(b,c1)∧c≠c1∧f=false)],\cr
precond(f,a)≡∃b,l&[a=move(b,l)∧((f=clear\;l∧l≠Table)\cr
&\qquad\qquad\qquad\qquad∨f=clear\;top\;b\cr
&\qquad\qquad\qquad\qquad∨(l=top\;b∧f=false))].\cr
}\eqno(12)$$
Notice that here, as in the Yale shooting problem, the explicit definitions of
$causes$ and $precond$ are obtained simply by applying predicate completion
to a subset of axioms (or, equivalently, by circumscribing $causes$ and $precond$
relative to that subset).
To state the analogue of Proposition 3, we define the operator ${\cal T}$
for the blocks world by the equations
$$\eqalignno{
{\cal T}[causes(move({\bf b},{\bf l}),{\bf p},{\bf f})]=&
\cases{{\bf f}=true, &if ${\bf p}=at({\bf b},{\bf l})$,\cr
{\bf f}=false,&if ${\bf p}=at({\bf b},{\bf l1})$, ${\bf l}≠{\bf l1}$,\cr
false, &otherwise,\cr}&(13)\cr
{\cal T}[causes(paint({\bf b},{\bf c}),{\bf p},{\bf f})]=&
\cases{{\bf f}=true, &if ${\bf p}=color({\bf b},{\bf c})$,\cr
{\bf f}=false,&if ${\bf p}=color({\bf b},{\bf c1})$, ${\bf c}≠{\bf c1}$,\cr
false, &otherwise,\cr}&(14)\cr
{\cal T}[precond({\bf f},move({\bf b},{\bf l}))]=
&\quad(({\bf f}=clear\;{\bf l}∧{\bf l}≠Table)\cr
&\qquad∨{\bf f}=clear\;top\;{\bf b}&(15)\cr
&\qquad∨({\bf l}=top\;{\bf b}∧{\bf f}=false)),\cr
{\cal T}[precond({\bf f},paint({\bf b},{\bf c}))]=&\quad false,&(16)\cr
{\cal T}[holds(at({\bf b},{\bf l}),S0)]=&\quad {\bf l}=Table,&(17)\cr
{\cal T}[holds(color({\bf b},{\bf c}),S0)]=&\quad {\bf c}=White,&(18)\cr
{\cal T}[holds({\bf p},result({\bf a},{\bf s}))]=&
\cases{{\cal T}[holds({\bf f},{\bf s})],&if ${\cal S}[{\bf a},{\bf s}],
{\cal T}[causes({\bf a},{\bf p},{\bf f})]$,\cr
{\cal T}[holds({\bf p},{\bf s})],&otherwise,\cr}&(8')\cr
{\cal T}[holds(false,{\bf s})]=&\quad false,&(9)\cr
{\cal T}[holds(true,{\bf s})]=&\quad true,&(10)\cr
{\cal T}[holds(clear\;{\bf l},{\bf s})]=
&\quad \bigwedge↓{\bf b}
¬{\cal T}[holds(at({\bf b},{\bf l}),{\bf s})],&(19)\cr
{\cal T}[relevant\;{\bf s}]=&\quad true.&(11)\cr
}$$
\proclaim Proposition 6. For any ground atom {\bf A} and any minimal model
$M$ of $B$, the truth value of {\bf A} in $M$ is ${\cal T}[{\bf A}]$.
Proofs of Propositions 4---6 are given in the appendix.
The finite automaton described by the equations for $T$ has $N↑2+4N$ input
symbols (the ground terms representing actions) and $2↑{N↑2+4N}$ states
(Boolean vectors whose components correspond to
the ground terms representing primitive
fluents). In view of the restrictions $B0.2$, $B0.3$, only few of these states
are accessible. The accessible states correspond to the equivalence classes of the
situation space relative to the relation
$$\bigwedge↓{\bf p}holds({\bf p},s)≡holds({\bf p},s'),$$
where {\bf p} ranges over the ground terms representing primitive fluents.
\bigskip
\bigskip
\centerline{\bf Appendix E. Miracles in Formal Theories of Action}
\bigskip
\centerline{Vladimir Lifschitz and Arkady Rabinov}
\bigskip
\sectCounter=0
\ecount=0
\def\hyperquad{\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad}
\def\sit{\hbox{\it Sit}}
%\def\l#1#2{\begin{equation}#1\label{#2}\end{equation}}
\def\l#1#2{$$#1\leql{#2}$$}
\def\br#1{\lbrack\!\lbrack#1\rbrack\!\rbrack}
\def\ref{\eqref}
\def\ll#1#2{$$#1$$}
\def\q{\;}
\def\ds{\hbox{-}}
\def\i{\hbox{if }}
\def\o{\overline}
\def\ow{\hbox{otherwise}}
\def\ha{\hbox{$\;$and$\;$}}
\def\ct{{\cal T}}
\def\cs{{\cal S}}
\def\cv{{\cal V}}
\def\tct{$\ct $}
\def\tcv{$\cv $}
\def\bo#1{\hbox{\bf #1}}
\def\wh{\hbox{\it expected$\;$}}
\def\ea{\hbox{\it earlier$\;$}}
\def\xea{\hbox{\it earlier}}
\def\pre{\hbox{\it precond$\;$}}
\def\xpre{\hbox{\it precond}}
\def\va{\hbox{\it value$\;$}}
\def\xva{\hbox{\it value}}
\def\hd{\hbox{\it holds$\;$}}
\def\ca{\hbox{\it causes$\;$}}
\def\cl{\hbox{\it clear$\;$}}
\def\xca{\hbox{\it causes}}
\def\mi{\hbox{\it miracle$\;$}}
\def\xmi{\hbox{\it miracle}}
\def\sca{\hbox{\it s-causes$\;$}}
\def\pai{\hbox{\it paint$\;$}}
\def\wt{\hbox{\it wait$\;$}}
\def\xwt{\hbox{\it wait}}
\def\ld{\hbox{\it load$\;$}}
\def\xld{\hbox{\it load}}
\def\ldd{\hbox{\it loaded$\;$}}
\def\xldd{\hbox{\it loaded}}
\def\al{\hbox{\it alive$\;$}}
\def\xal{\hbox{\it alive}}
\def\sht{\hbox{\it shoot$\;$}}
\def\xsht{\hbox{\it shoot}}
\def\ef{\hbox{\it effects$\;$}}
\def\af{\hbox{\it affects$\;$}}
\def\xaf{\hbox{\it affects}}
\def\ab{\hbox{\it ab$\;$}}
\def\xab{\hbox{\it ab}}
\def\at{\hbox{\it at$\;$}}
\def\tab{\hbox{\it Table$\;$}}
\def\no{\hbox{\it not$\;$}}
\def\eq{\hbox{\it equal$\;$}}
\def\mv{\hbox{\it move$\;$}}
\def\stp{\hbox{\it staple$\;$}}
\def\stpd{\hbox{\it stapled$\;$}}
\def\cut{\hbox{\it cut$\;$}}
\def\sp{\hbox{\it s-precond}}
\def\and{\hbox{\it and$\;$}}
\def\res{\hbox{\it result$\;$}}
\def\xres{\hbox{\it result}}
\def\suc{\hbox{\it success$\;$}}
\def\xsuc{\hbox{\it success}}
\def\alw{\hbox{\it always$\;$}}
\def\tr{\hbox{\it true$\;$}}
\def\xtr{\hbox{\it true}}
\def\fl{\hbox{\it false$\;$}}
\def\xfl{\hbox{\it false}}
\def\sti{\hbox{\it fasten$\;$}}
\def\un{\hbox{\it unfasten$\;$}}
\def\stu{\hbox{\it fastened$\;$}}
\def\ra{{\rightarrow}}
%\newcounter{ca1}
%\newcounter{ca2}
%\newcounter{pre1}
%\newcounter{pre2}
%\newcommand{\ttt}[1]{{\parbox[t]{3in}{#1}}}
\def\nx{\hbox{\it next$\;$}}
\def\0{\hbox{$\overline 0$}}
\def\1{\hbox{$\overline 1$}}
\def\2{\hbox{$\overline 2$}}
\def\N{\hbox{$\overline N$}}
\def\tca{\hbox{$\ct$-\ca}}
\def\tsuc{\hbox{$\ct$-\suc}}
\def\tpre{\hbox{$\ct$-\pre}}
\def\c{\hbox{\rm Circum}}
%\begin{document}
%\baselineskip 23pt
%\title
%{\Large \bf Miracles in Formal Theories of Action}
ABSTRACT. Most work on reasoning about action is based on the implicit assumption
that there are no events happening in the world concurrently with the
actions that are being carried out. We discuss the possibility of relaxing
this assumption and treating it as a default principle---if
it is inconsistent with the given facts, then we will admit the possibility of
unknown events, ``miracles,'' that, along with the given actions,
contribute to the properties of the new situation. The formalism proposed
by one of the authors in the paper {\it Formal Theories of Action}
does not treat ``miracles'' properly. We discuss a modification of
that approach which corrects this problem.
\section{Introduction}
The approach to formalizing reasoning about actions proposed in
(Lifschitz 1987) is based
on the situation calculus (McCarthy and Hayes 1969)
and circumscription (McCarthy 1980, 1986).
Two new predicates introduced in (Lifschitz 1987) are $\ca(a,f,v)$, expressing that
the action $a$ causes the fluent $f$ to take on the value $v$, and $\pre(f,a)$,
expressing that the propositional fluent $f$ is a precondition for the action
$a$. For instance, the axioms
$$\ca(\mv(x,l),\at(x,l),\tr)$$
and
$$\pre(\cl l,\mv(x,l))$$
represent the assumptions that moving a block $x$ to a location $l$ causes $x$ to
be at that location, and that a location must be clear when a block is moved
there.
The relation between these predicates and the values of fluents can be described
by two axioms: the ``law of change''
\l{\suc(a,s) ∧ \ca(a,f,v) ⊃ \va(f,\res(a,s))=v}{l1}
and the ``law of inertia''
\l{¬\af(a,f,s) ⊃ \va(f,\res(a,s))=\va(f,s),}{l2}
where $\suc$ and $\af$ are defined by
\l{\suc(a,s) ≡ ∀f(\pre(f,a) ⊃ \va(f,s)=\tr)}{def1}
(an action is successful if all its preconditions are satisfied), and
\l{\af(a,f,s) ≡ \suc(a,s) ∧ ∃v\;\ca(a,f,v)}{def2}
(an action affects a fluent $f$ if it is successful and causes $f$ to take on
some value). Both \ca and \pre are circumscribed, to ensure that actions
have no effects or preconditions other than those postulated in the axioms.
This formalism was designed as an alternative to John McCarthy's original idea
of solving the frame problem by {\it minimizing change}, i.e., assuming that
{\it fluents normally do not change their values} (McCarthy 1986). Formally,
this can be expressed by
\l{¬\ab(a,f,s) ⊃ \va(f,\res(a,s))=\va(f,s),}{l3}
where \ab is a circumscribed ``abnormality'' predicate.
After that approach was found unsatisfactory (Hanks and McDermott 1986),
several improvements were
proposed, and (\ref{l2}) is one of them.
The form of reasoning investigated in (Lifschitz 1987) is known as {\it temporal
projection} (Hanks and McDermott 1986).
In a temporal projection problem, we are given
information about the values
of fluents in an initial situation, and our purpose is to predict the values of
fluents after a certain sequence of actions is performed. This kind of reasoning is
important because of its close connection with planning. The problem of
verifying that a given plan achieves a given goal is a temporal projection
problem.
But there are also other, more complex forms of reasoning about actions. Given
information about the situation $result(a,S0)$, we may wish to determine the
values of fluents in the ``past'' situation $S0$. For instance,
if all blocks are on the table in the situation
$$S1=\res(\mv(A,\tab),S0),$$
then we can
conclude that all blocks other than $A$ were on the table in the situation $S0$.
More generally, we may begin with some facts about the situation that results
when several actions $a↓1,\dots,a↓k$ are carried out consecutively, and draw
conclusions about the initial situation $S0$. Or we may know the values of some
fluents in two (or more) situations,
and draw conclusions about the values of other fluents.
For instance, if the top of $B$ was not clear in $S0$ but is clear in $S1$,
after $A$ was moved, then it follows that
$A$ was on top of $B$ in $S0$.
Although more complex than temporal projection, these forms of reasoning can
be formalized in the framework of (Lifschitz 1987)
in a straightforward way. The only
difference is that, instead of (or in addition to) ``initial conditions,''
specifying the values of fluents in the initial situation $S0$, the axiom set
will contain facts about the values of fluents in some situations that are
``in the future'' of $S0$, such as $S1$.
All these examples are based on the implicit assumption that {\it there are
no events happening in the world concurrently with the actions that are
being carried out}, or at least no events that can affect the values of the fluents
we are interested in. Without this assumption, we can hardly make any nontrivial
assertions about the outcome of a given action. It is possible, for instance,
that the position of $B$ will change after we move $A$---if someone else
can simultaneously move $B$, or if $B$ can fall down from its original
position by itself. Moreover, we cannot even claim that $A$ will get to its
destination, if there is a possibility that someone else holds $A$ in its
place while we are trying to move it.
In this note we discuss the possibility of relaxing this important assumption
and treating it as a ``default'' principle. {\it In the absence of information to
the contrary}, we will assume, as before, that there are
no events happening in the world concurrently with the actions that are
being carried out. But if this assumption is inconsistent with the given facts,
then we will be prepared to admit the possibility of events that, along with
the actions in question, contribute to the properties of the new situation.
When some properties of a situation cannot be explained by the action that leads
to it, we will say that a ``miracle'' happened in that situation. We are
prepared to admit a ``miracle,'' when this is warranted by the available information
about the values of fluents---and only in that case.
In temporal projection problems and in the other examples of reasoning discussed
above, allowing ``miracles'' is not necessary, and we would like our formalism
to lead to the same conclusions as before. But if the axioms imply that the
position of $B$ changed when the action $\mv(A,\tab)$ was carried out in the
situation $S0$, then the conclusion should be that a ``miracle'' happened.
This ``miracle'' should not affect any situations in the future of $S1$.
Unfortunately, the approach proposed in (Lifschitz 1987) does not treat ``miracles''
correctly. If $B$ changed its position when $A$ was moved onto the table in the
situation $S0$, then the ``law of inertia'' (\ref{l2}) will imply that the action
$\mv(A,\tab)$ affects the position of $B$ in $S0$. By the definition of
\xaf, it will follow then that this action causes $B$ to move to another
location---irrespective of the particular situation in which the action is
carried out, because the predicate $\ca$ has no situation arguments and describes
properties of actions in general (``action types''), rather than their specific
instances. And then we will have to conclude that
$\mv(A,\tab)$ executed {\it in any other situation} will affect the position of
$B$ in the same way. Confronted with ``miracles,'' the formalism of (Lifschitz
1987) leads to conclusions that are too drastic.
The purpose of this note is to show that this problem can be easily corrected.
The main idea of our solution is discussed in Section 2. Then we illustrate it
by formalizing two simple examples, and state a few propositions showing that the
formalizations are adequate (Sections 3 and 4). The proofs are sketched in Section
5.
\section{Miracles are Abnormal}
The solution consists in combining the method of (Lifschitz 1987)
with McCarthy's original
idea of minimizing change. Formula (\ref{l2}) requires that any change in the value of
$f$ be explained as the effect of the action that was carried out. Formula (\ref{l3})
requires that any change in the value of $f$ be treated as an ``abnormality,''
subject to circumscription. In the new formulation, both viewpoints will be
presented, the latter being reserved for describing miracles. Formally, this
can be done by postulating the following ``hybrid'' of (\ref{l2}) and (\ref{l3}):
\l{¬\af(a,f,s)∧¬\ab(a,f,s) ⊃ \va(f,\res(a,s))=\va(f,s).}{l4}
In temporal projection problems and other cases when miracles are not involved,
there is no conflict between the tasks of minimizing \ca, \pre and \xab; minimizing
\ab will make it identically false. But if moving $A$ leads to a situation in which
the location of $B$ changes, then we have a choice: either \ab or \af
should be made true at the corresponding point $(a,f,s)$. We have seen that the
second possibility leads to undesirable conclusions about the effect of $a$ on
$f$ in general. We can force circumscription to
select the first possibility by assigning to \ab a lower priority than to \xca.
Instead of \xab, we will use the more suggestive notation $\mi(f,s)$ (``a miracle
happened to the fluent $f$ in the situation $s$''). Then (\ref{l4}) will become
$$¬\af(a,f,s))∧¬\mi(f,\res(a,s)) ⊃ \va(f,\res(a,s))=\va(f,s),$$
or, equivalently,
\l{\eqalign{&\neg\af(a,f,s)
\supset[\va(f,\res(a,s))=\va(f,s)\cr
&\hyperquad \lor \mi(f,\res(a,s))].\cr}}{l5}
%\l{\begin{array}{l}
%¬\af(a,f,s)\\
%\qquad ⊃ [\va(f,\res(a,s))=\va(f,s) ∨ \mi(f,\res(a,s))].
%\end{array}
%}{l5}
The ``law of change'' (\ref{l1}) will be weakened in a similar way, and replaced by
\l{\eqalign{
&\suc(a,s)\land\ca(a,f,v)
\supset[\va(f,\res(a,s))= v\cr
&\hyperquad \lor \mi(f,\res(a,s))].\cr}}{l6}
%\l{\begin{array}{l}
%\suc(a,s) ∧ \ca(a,f,v)\\
%\qquad\qquad ⊃ [\va(f,\res(a,s))=v ∨ \mi(f,\res(a,s))].
%\end{array}
%}{l6}
Formula (\ref{l5}) deals with ``positive miracles,'' when a fluent not
affected by an action changes its value after the action is carried out. In
(\ref{l6}), we
provide for the possibility of ``negative miracles,'' when a fluent
affected by an action changes its value in an unexpected way (possibly
retains the old value).
In addition to making the formalism more flexible, the predicate \mi helps us
resolve a technical problem that was handled in (Lifschitz 1987) in
a rather unintuitive
way. In some models of the axioms, the initial situation $S0$ is the
result of executing a certain action $a$ in an earlier situation:
$S0=\res(a,s)$.
What if $a$ affects the value of some fluent in a way inconsistent with the
initial conditions? For instance,
$S0$ may be the result of painting $A$ red, while the initial conditions
require that $A$ be white in the situation $S0$.
The only possible explanation would be that painting has
preconditions not mentioned in the axioms. To eliminate this undesirable
conclusion, axioms (\ref{l1}) and (\ref{l2}) were restricted in (Lifschitz 1987)
to ``relevant'' situations, and $s$ was required to be not relevant.
The new approach gives a better solution: We can treat such possibilities as
miracles happening in the initial situation.
These ideas are illustrated below on a simple example involving the actions
\xwt, \ld and \xsht.
First we show that miracles do not interfere with temporal projection.
Then we add an axiom saying in the situation corresponding to the
last scene of the classical scenario,
$$\res(\sht,\res(\wt,\res(\ld,S0))),$$
the value of \al is \xtr. We will see that each minimal model of this theory has
exactly one miracle in the future of $S0$. In some models, a negative miracle
happens to \ldd in the situation $\res(\ld,S0)$, so that the gun
is not loaded in that situation. In other models, a positive miracle
happens to this fluent in the situation $\res(wait,\res(\ld,S0))$.
Finally, there are models in which the gun is loaded after \xwt, but a negative
miracle happens to \al in the final situation. These three
possibilities correspond to three simplest explanations of the unexpected outcome.
\section{Example: Temporal Projection}
%\setcounter{equation}{0}
%\renewcommand{\theequation}{\mbox{A.\arabic{equation}}}
We use variables of four sorts:
for truth values ($v$, $v1$, $\dots$),
for actions ($a$, $a1$, $\dots$),
for propositional fluents ($f$, $f1$, $\dots$),
and for situations ($s$, $s1$, $\dots$).
The object constants are: \fl and \tr (truth values);
\xwt, \ld and \sht (actions); \ldd and \al (propositional fluents);
$S0$ (situation). The binary function \res$\;$ transforms an action and a situation
into a situation; the binary function \va transforms a propositional fluent and
a situation into a truth value. Finally, there are four predicate constants:
$\ea(s1,s2)$ --- ``$s1$ is earlier than $s2$'';
$\mi(f,s)$ --- ``a miracle happens to $f$ in the situation $s$'';
$\pre(f,a)$ --- ``$f$ is a precondition for $a$'';
$\ca(a,f,v)$ --- ``$a$ causes $f$ to take on the value $v$''.
The abbreviations \suc and \af are defined by (\ref{def1}) and (\ref{def2}).
We will use $\bo{v}$ as a metavariable for the constants \fl and
\xtr, and $\bo{a}$, $\bo{f}$ and $\bo{s}$ as
metavariables for the ground terms representing actions, propositional fluents
and situations respectively.
The axiom set consists of five groups.
%\begin{enumerate}
\item{1.} Temporal order:
\l{\ea(s,\res(a,s)),}{A-ea-1}
\l{\ea(s1,s2)\land \ea(s2,s3)\supset \ea(s1,s3),}{A-ea-2}
\l{\neg \ea(s,s).}{A-ea-3}
\item{2.} Change and inertia:
\l{\eqalign{
&\suc(a,s)\land\ca(a,f,v)
\supset[\va(f,\res(a,s))= v\cr
&\hyperquad \lor \mi(f,\res(a,s))],\cr}}{A-lc}
\l{\eqalign{&\neg\af(a,f,s)
\supset[\va(f,\res(a,s))=\va(f,s)\cr
&\hyperquad \lor \mi(f,\res(a,s))].\cr}}{A-li}
\item{3.} Closure and uniqueness:
\l{v=\fl \lor v=\tr,}{A-u-5}
\l{\fl≠\tr,}{A-u-2}
\l{\wt≠\ld,\;\wt≠\sht,\;\ld≠\sht,}{A-u-1}
\l{\ldd≠\al,}{A-u-3}
\l{\res(a1,s1)=\res(a2,s2)\supset (a1=a2 \land s1=s2).}{A-u-4}
\item{4.} Effects of actions:
\l{\ca(\ld,\ldd,\tr),}{A-ca-1}
\l{\ca(\sht,\ldd,\fl),}{A-ca-2}
\l{\ca(\sht,\al,\fl),}{A-ca-3}
\l{\pre(\ldd, \sht).}{A-pre-1}
\item{5.} Initial conditions:
\l{\va(\al,S0)=\tr,}{A-ini-1}
\l{\va(\ldd,S0)=\fl.}{A-ini-2}
%\end{enumerate}
%\setcounter{equation}{6}
%\renewcommand{\theequation}{\mbox{\arabic{equation}}}
The conjunction of the universal closures of these axioms will be denoted by $A$.
We circumscribe the predicates \xca, \pre and \mi
relative to $A$, with \mi given a lower priority than the others, and with
the function \va allowed to vary. In the notation of (Lifschitz 1985), this
circumscription can be written as
\l{\c(A;\ca,\pre>\mi;\va).}{circ}
By a {\it minimal} model of $A$ we understand
any model of this circumscription.
The following propositions show that this formalization is adequate
for temporal projection. To begin with, it is consistent:
\proclaim Proposition 3.1.
A has minimal models.
Next we want to establish that the circumscribed predicates have the values that
we intuitively expect. For \ca and \xpre, these values are given by the
formulas
\l{\eqalign{
\ca(a,p,f)\equiv\quad &(a=\ld\land p=\ldd\land f=\tr)\cr
\lor &(a=\sht\land p=\ldd\land f=\fl)\cr
\lor &(a=\sht\land p=\al\land f=\fl),\cr
\pre(f,a)\equiv\quad &(f=\ldd\land a=\sht).\cr
}}{A-min-1}
About \mi we would like to show that it is false in every situation $s$
that is the result of executing a sequence of actions in
the initial situation $S0$. How can this restriction on $s$ be expressed by
a formula? The formula
$\ea(S0,s)$ will not do: the temporal order axioms are not sufficiently strong to
guarantee that any $s$ satisfying this condition belongs to the same ``world
history'' as $S0$.
The property that we need can be defined explicitly. Consider the binary
relations $\le$ and $<$ defined by:
\ll{s1\le s2 \equiv \forall p
\{p(s1) \land [\forall s\,a(p(s)\supset p(\res(a,s)))]\supset p(s2)\},}{}
\ll{s1<s2 \equiv s1\le s2 \land s1\not=s2.}{}
Then the formula $S0<s$ expresses that $s$ is the result of executing a nonempty
sequence of actions in $S0$. It is easy to see that the axioms $A$ imply
%$$s1<s2⊃\ea(s1,s2),$$ and also
$S0\le{\bo s}$ for any situation term {\bo s} without
variables.
\proclaim Proposition 3.2.
Every minimal model of $A$ satisfies (\ref{A-min-1}) and
\ll{S0<s\supset\neg\mi(f,s).}{}
There are formulas that are true in some minimal models of $A$ and false
in other minimal models,
so that they can be neither proved nor refuted on the
basis of circumscription (\ref{circ}). The ``domain closure assumption'' for
fluents,
$$∀f(f=\al∨f=\ld),$$
can serve as an example. Our ``circumscriptive theory'' is
incomplete. Proposition 3.2 implies, however, that for ground atoms of the forms
\l{\ca(\bo{a},\bo{f},\bo{v}),\quad\pre(\bo{f},\bo{a}),}{atoms1}
\l{\mi(\bo{f},\bo{s})\qquad(\bo{s}\hbox{ different from }S0)}{atoms2}
a definite answer can be always derived from (\ref{circ}), so that each of these
atoms is either true in every mimimal model of $A$, or false in every such model.
Notice that $\mi({\bo f},S↓0)$ is not on this list; it depends on the model whether
any miracles happened in the initial situation. For any ground atom {\bo F} that
has one of the forms (\ref{atoms1}), (\ref{atoms2}), let $\ct({\bf F})$ denote
\tr or \fl depending on whether {\bo F} is true or false in the minimal models
of $A$.
Finally, we can show that the value of a fluent in a situation is completely
determined if both of them are represented by ground atoms. The function $\cv$, mapping
pairs ${(\bo f},{\bo s})$ into $\{\fl,\tr\}$,
is defined by the following recursive equations:
\ll{\cv[\ldd,S0]=\fl,\qquad \cv[\al,S0]=\tr,}{}
%\ll{\cv[\bo{f},\res(\bo{a},\bo{s})]=
%\eqalign{
%\left\{
%\bo{v},&\i \cs[a,s]\land \ct[\ca(\bo{a},\bo{f},\bo{v})],\cr
%\cv[\bo{f},\bo{s}],&\ow,
%\right\}
%\cr
\ll{\cv[\bo{f},\res(\bo{a},\bo{s})]=\cases{
\bo{v},&\i $\cs[a,s]\land \ct[\ca(\bo{a},\bo{f},\bo{v})]$,\cr
\cv[\bo{f},\bo{s}],&\ow,\cr
}}{}
where
\ll{\cs[a,s]=\bigwedge_{\bo{f}:\ct[\pre(\bo{f},\bo{a})]}\cv[\bo{f},\bo{s}].}{}
(The correctness of this definition depends on the fact that
$\ct[\ca(\bo{a},\bo{f},\fl)]$ and $\ct[\ca(\bo{a},\bo{f},\tr)]$ cannot be both
equal to \xtr.)
\proclaim Proposition 3.3. In every minimal model of A,
\ll{\va({\bo f},{\bo s}) = \cv[{\bo f},{\bo s}].}{}
\section{Example: Three Miracles}
In the second example, one more sentence is added to the last group of axioms:
%\renewcommand{\theequation}{\mbox{A.17}}
\l{\va(\al,\res(\sht,\res(\wt,\res(\ld,S0))))=\tr.}{mir1}
%\addtocounter{equation}{-1}
%\renewcommand{\theequation}{\mbox{\arabic{equation}}}
By $A'$ we will denote the conjunction of $A$ and (\ref{mir1}). The
circumscription policy remains the same as before.
In this case, we would like to demonstrate the existence of
not only one minimal model, but of minimal models of three different types,
corresponding to the three miracles that can be responsible for the unexpected
outcome of shooting (see the end of Section 2). These types of
models can be conveniently characterized using the following notation:
\ll{\bo f_{1}=\bo f_{2}=\ldd,\;\bo f_{3}=\al,}{}
\ll{\bo{s}_{1}=\res(\ld,S0),\;\bo{s}_{2}=\res(\wt,\bo{s}_{1}),\;
\bo{s}_{3}=\res(\sht,\bo{s}_{2}).}{}
\proclaim Proposition 4.1.
For every $i$ $(1≤i≤3)$, $A'$ has a minimal model satisfying
$$\mi(\bo f_{i},\bo s_{i}).$$
The following counterpart of Proposition 3.2 shows that each minimal model of
$A'$ has exactly one miracle happening in the future of $S0$, and this miracle
is one of the three mentioned in Proposition 4.1.
\proclaim Proposition 4.2.
Every minimal model of $A'$ satisfies (\ref{A-min-1}) and
\ll{\bigvee↓{1≤i≤3}∀f\,s\,[S0<s⊃\mi(f,s)≡(f={\bo f}↓i ∧ s={\bo s}↓i)].}{}
In place of the functions $\ct$ and $\cv$, the analysis of
extended example requires three pairs
of functions, $\ct_i$ and $\cv_i$ ($1≤i≤3$), that correspond to the three types of
minimal models. On atoms (\ref{atoms1}), $\ct↓i$ coincides with $\ct$; on atoms
(\ref{atoms2}), it is defined by
\ll{\ct↓i[\mi(\bo{f},\bo{s})] = (\bo{f}=\bo{f}↓i ∧ \bo{s}=\bo{s}↓i).}{}
Propositions 4.1 and 4.2 show that there are indeed three types of minimal models
of $A'$; for any model $M$ of the $i$-th type,
a ground atom {\bf F} of the form (\ref{atoms1}) or
(\ref{atoms2}) is true in $M$ iff $\ct↓i[\bo{F}]$ is \xtr.
The functions $\cv_i$ are defined by the recursive equations:
\ll{\cv↓i[\ldd,S0]=\fl,\qquad \cv↓i[\al,S0]=\tr,}{}
\ll{
\cv↓i[\bo{f},\res(\bo{a},\bo{s})]=\cases{
\bo{v}\equiv¬\ct↓i[\mi(\bo{f},\res(\bo{a},\bo{s}))],
&\i $\cs↓i[a,s]\land \ct↓i[\ca(\bo{a},\bo{f},\bo{v})]$,\cr
\cv↓i[\bo{f},\bo{s}]\equiv¬\ct↓i[\mi(\bo{f},\res(\bo{a},\bo{s}))],&\ow,
\cr
}}{}
where
\ll{\cs↓i[a,s]=\bigwedge_{\bo{f}:\ct↓i[\pre(\bo{f},\bo{a})]}\cv↓i[\bo{f},\bo{s}].}{}
\proclaim Proposition 4.3.
In every minimal model of $A'$ of the $i$-th type $(1≤i≤3)$,
\ll{\va({\bo f},{\bo s}) = \cv↓i[{\bo f},{\bo s}].}{}
\section{Proofs}
\noindent{\bf Proof of Proposition 3.1.}
A minimal model $M$ of $A$ can be constructed as follows.
The domain of truth values consists of the constants \fl and \xtr; the domains of
actions, fluents and situations consist of all ground terms of the corresponding
sorts. The object constants and \res are interpreted in the natural way;
\va is interpreted as $\cv$. The interpretations of \ca and \pre are given by
$\ct$; \mi is identically false.
%$\Box$
\medskip
\noindent{\bf Proof of Propositions 3.2 and 3.3.}
Let $M$ be a model of $A$. Consider the formulas $F↓n(s)$ defined by:
\ll{
\eqalign{
&F_0(s)≡s=S0,\cr
&F_{n+1}(s)≡∃s1\;a(F↓n(s1)∧s = \res(a,s1)).
\cr
}}{}
Let $\sit↓n$ ($n≥0$) be the set of the situations in $M$ that satisfy $F↓n(s)$.
The sets $\sit↓n$ are disjoint, and their union
is the set of situations $s$ satisfying $S↓0≤s$.
Consider the structure $M↑*$ obtained from $M$ by changing the interpretations
of \xca, \xpre, \mi and \va as follows. The predicates
\ca and \pre are defined in $M↑*$ by (\ref{A-min-1}). The values of
$\mi(f,s)$ and $\va(f,s)$ will change only for the values of $s$ belonging to
some $\sit↓n$ ($n>0$). For every such value of $s$ and every value of $f$,
$\mi(f,s)$ is false in $M↑*$. For every value of $s$ from $\sit↓n$ and
every value of $f$, $\va(f,s)$ is defined by induction on $n$ so that the axioms
(\ref{A-lc}) and (\ref{A-li}) be satisfied.
It is easy to check that $M↑*$ is a model of $A$, and that
$M↑*=M$ if $M$ is minimal. The assertions of Propositions 3.2 and 3.3
immediately follow.
%$\Box$
\medskip
\noindent{\bf Proof of Proposition 4.1.}
Similar to the proof of Proposition 3.1, except that, in the $i$-th model,
the interpretation of \va is given by $\cv↓i$, and \mi is defined by
\ll{\mi(f,s)≡(f=\bo{f}↓i∧s=\bo{s}↓i).}{}
%$\Box$
\proclaim Lemma.
Every minimal model of $A'$ satisfies (\ref{A-min-1}) and
\l{\bigvee↓{1≤i≤3}\mi({\bo f}↓i,{\bo s}↓i).}{or-3}
\noindent{\bf Proof.}
Let $M$ be a minimal model of $A'$.
Consider the structure $M↑*$ obtained from $M$ by changing the interpretations
of \xca, \pre and \mi as follows:
\ca and \pre are defined by (\ref{A-min-1}), and \mi is identically true.
It is easy to check that $M↑*$ is a model of $A'$. Since $M$ is minimal, it follows
that \ca and \pre are interpreted in $M$ in the same way as in $M↑*$ (recall that
these predicates are circumscribed at a higher priority than \xmi). Consequently,
$M$ satisfies (\ref{A-min-1}). Furthermore, it follows from the axioms $A$,
from (\ref{A-min-1}) and from the negation of (\ref{or-3}) that
\ll{\va(\al,\res(\sht,\res(\wt,\res(\ld,S0))))=\fl,}{}
contrary to (\ref{mir1}).
\medskip
\noindent{\bf Proof of Propositions 4.2 and 4.3.}
Let $M$ be a minimal model of $A'$. We already know from Lemma 5.1 that $M$
satisfies (8).
Consider the structure $M↑*$ obtained from $M$ by changing the interpretations
of \mi and \va as follows. The values of
$\mi(f,s)$ and $\va(f,s)$ will change only for the values of $s$ belonging to
some $\sit↓n$ ($n>0$). By Lemma 5.1, $M$ satisfies at least one of the conditions
$\mi(\bo f_{i},\bo{s}_{i})$ ($1≤i≤3$); define
$\mi(f,s)≡(f={\bo f}↓i ∧ s={\bo s}↓i)$ for every value of $f$ and
every value of $s$ from some $n$. The interpretation of
$\va(f,s)$ is defined in the same way as in the proof of Propositions 3.2 and 3.3.
It is easy to check that $M↑*$ is a model of $A'$, and that
$M↑*=M$ if $M$ is minimal. The assertions of Propositions 4.2 and 4.3
immediately follow.
\section{Acknowledgements}
We would like to thank Andrew Baker, Michael Gelfond, Matthew Ginsberg and John
McCarthy for useful discussions related to the subject of this paper.
\bigbreak
\centerline{\bf References}
\bigskip
{
\parindent=0pt \parskip 8pt
Brewka, Gerhard (1987).
The Logic of Inheritance in Frame
Systems. In: {\it Proc. IJCAI-87, 1\/}, pp. 483--487.
Dennett, D.C. (1971). Intentional Systems, {\it Journal of Philosophy 68},
No. 4.
Doyle, Jon (1979).
A Truth Maintenance System,
{\it Artificial Intelligence, 12\/}, pp. 231--272.
Etherington, D., Mercer, R. and Reiter, R. (1985). On the Adequacy
of Predicate Circumscription for Closed-World Reasoning, {\it Computational
Intelligence 1}.
Fikes, R., and Nils Nilsson, (1971).
STRIPS: A New Approach to the Application of
Theorem Proving to Problem Solving, {\it Artificial Intelligence 2},
Numbers 3,4, January,
pp. 189-208.
Finger, J. (1986). {\it Exploiting Constraints in Design Synthesis}, PhD thesis,
Stanford University, Stanford.
Gelfond, Michael (1988).
{\it Autoepistemic Logic and Formalization of Commonsense
Reasoning}, preprint.
Gelfond, Michael and Vladimir Lifschitz (1988).
The Stable Model Semantics for Logic Programming. In: R. A. Kowalski and K. A.
Bowen (eds.), {\it Logic Programming: Proceedings of the Fifth International
Conference and Symposium, 2\/}, pp. 1070--1080.
Gelfond, Michael and Halina Przymusinska (1986).
Negation as Failure: Careful Closure Procedure. {\it Artificial Intelligence,
30}, Number 3, pp. 273--288.
Genesereth, Michael and Nils Nilsson (1987). {\it Logical
Foundations of Artificial Intelligence},
Morgan-Kaufmann.
Ginsberg, Matthew L. (1988)
A Circumscriptive Theorem Prover: Preliminary Report. In: {\it Proc. AAAI-88 2\/},
pp. 470--474.
Ginsberg, Matthew L. and Smith, D. (1986).
{\it Reasoning About Action I: A Possible World
Approach}, forthcoming.
Green, Cordell (1969).
Application of Theorem Proving to Problem Solving. In: {\it Proc. IJCAI-1},
pp. 219-239.
Hanks, Steve and Drew McDermott (1985).
{\it Temporal Reasoning and Default Logics},
Technical Report YALEU/CSD/RR\#430, Yale University.
Hanks, Steve and Drew McDermott (1986).
Default Reasoning, Nonmonotonic Logics, and the Frame Problem. In:
{\it Proc. AAAI-86, 1\/}, pp. 328--333.
Hanks, Steve and Drew McDermott (1987).
Nonmonotonic Logic and Temporal Projection,
{\it Artificial Intelligence 33\/}, Number 3, pp. 379--412.
Haugh, Brian A. (1988).
Tractable Theories of Multiple Defeasible Inheritance in Ordinary
Nonmonotonic Logics. In:
{\it Proc. AAAI-88, 2\/}, pp. 421--426.
Kautz, H. (1986). The logic of persistence,
{\it Proc. AAAI-86, 1}, pp. 401--405.
Ketonen, Jussi and Joseph S. Weening (1984). {\it EKL --- An Interactive
Proof Checker, User's Reference Manual}, Computer Science Department,
Stanford University.
de Kleer,Johann (1986).
An Assumption-Based Truth Maintenance System,
{\it Artificial Intelligence 28\/}, pp. 127--162.
Kolaitis, Phokion G. and Christos H. Papadimitriou (1988).
Some Computational Aspects of Circumscription, {\it AAAI-88, 2\/}, pp. 465--469.
Konolige, Kurt (1987).
On the Relation Between Default and Autoepistemic Logic.
In: M. L. Ginsberg (ed.), {\it Readings in Nonmonotonic Reasoning},
Morgan-Kaufmann, pp. 195--226.
Kowalski, Robert (1979). {\it Logic for Problem Solving},
North-Holland, Amsterdam.
Kowalski, Robert and Marek Sergot (1986). A Logic-based Calculus of
Events, {\it New Generation Computing 4}, pp. 67--95, Ohmsha Ltd. and
Springer Verlag.
Levesque, Hector (1987).
All I Know: An Abridged Report. In: {Proc. AAAI-87, 2\/}, pp. 426--431.
Lifschitz, Vladimir (1985).
Computing Circumscription. In: {\it Proc. AAAI-85, 1\/}, pp. 121--127.
Lifschitz, Vladimir (1986).
On the Satisfiability of Circumscription,
{\it Artificial Intelligence, 28\/}, pp. 17--27.
Lifschitz, Vladimir (1987).
Formal Theories of Action. In: F. M. Brown (ed.),
{\it The Frame Problem in Artificial Intelligence}, Morgan Kaufmann, pp. 35--58.
Lifschitz, Vladimir (1987a).
Pointwise Circumscription.
In: M. L. Ginsberg (ed.), {\it Readings in Nonmonotonic Reasoning},
Morgan-Kaufmann, pp. 179--193.
Lifschitz, Vladimir (1988).
On the Declarative Semantics of Logic Programs with Negation.
In: J. Minker (Ed.), {\it Foundations of Deductive Databases and Logic
Programming}, Morgan Kaufmann, pp. 177--192.
Lifschitz, Vladimir (1988a).
{\it Between Circumscription and Autoepistemic Logic}. (In preparation.)
Lifschitz, Vladimir and Arkady Rabinov (1988).
Miracles in Formal Theories of Action. {\it Artificial Intelligence} (to appear).
McCarthy, John (1959). Programs with Common Sense. In: {\it Proceedings of the
Teddington Conference on the Mechanization of Thought Processes}, London:
Her Majesty's Stationery Office.
McCarthy, John (1977).
Epistemological Problems of Artificial Intelligence. In: {\it Proc.
IJCAI-77}, pp. 1038--1044.
McCarthy, John (1979).
Ascribing Mental Qualities to Machines. In: Ringle, M. (Ed.), {\it
Philosophical Perspectives in Artificial Intelligence}, Humanities Press.
McCarthy, John (1979a).
First Order Theories of Individual Concepts and Propositions.
In: Michie, Donald (Ed.), {\it Machine Intelligence 9}, Ellis Horwood.
McCarthy, John (1980).
Circumscription --- A Form of Non-Monotonic Reasoning, {\it Artificial
Intelligence}, Volume 13, Numbers 1,2.
McCarthy, John (1982). Common Business Communication Language, in
{\it Textverarbeitung und B\"urosysteme}, Albert Endres and J\"urgen Reetz, eds.
R. Oldenbourg Verlag, Munich and Vienna 1982.
McCarthy, John (1983).
AI Needs more Emphasis on Basic Research,
{\it AI Magazine 4\/}, Number 4.
McCarthy, John (1986).
Applications of Circumscription to Formalizing Common Sense Knowledge
{\it Artificial Intelligence 28\/}, pp. 89--116.
McCarthy, John (1988).
Mathematical Logic in Artificial Intelligence.
{\it D\ae dalus}, Winter 1988, pp. 297--311.
McCarthy, John and Patrick Hayes (1969). Some Philosophical
Problems from the Standpoint of Artificial Intelligence,
in B. Meltzer and D. Michie (eds), {\it Machine Intelligence 4},
Edinburgh University.
McDermott, D. and J. Doyle (1980).
Nonmonotonic Logic I, {\it Artificial Intelligence 13\/},
Number 1.
McDermott, D. (1987).
A Critique of Pure Reason, {\it Computational Intelligence 3}, Number 3 (with
peer commentaries).
Minker, Jack and Donald Perlis (1985).
Computing Protected Circumscription. {\it Logic Programming Journal, 4}, pp.
235--249.
Montague, Richard (1963). Syntactical Treatments of Modality, with
Corollaries on Reflexion Principles and Finite Axiomatizability,
{\it Acta Philosophica Fennica 16\/}, pp. 153--167.
(Reprinted in (Montague 1974).)
Montague, Richard (1974). {\it Formal Philosophy}, Yale University Press.
Moore, Robert (1985). Semantical considerations on nonmonotonic logic,
{\it Artificial Intelligence 25\/}, Number 1.
Morgenstern, Leora and Lynn A. Stein (1988).
Why Things Go Wrong: A Formal Theory of Causal Reasoning. In: {\it Proc. AAAI-88,
2\/}, pp. 518--523.
Morris, Paul (1987).
Curing Anomalous Extensions. In:
{\it Proc. AAAI-87, 2\/}, pp. 437--442.
Morris, Paul (1988).
Stable Closures, Defeasible Logic and Contradiction Tolerant Reasoning. In:
{\it Proc. AAAI-88, 2\/}, pp. 506--511.
Newell, Allen (1981). The Knowledge Level, {\it AI Magazine 2\/},
Number 2.
Perlis, Donald (1988). Autocircumscription. {\it Artificial Intelligence, 36},
Number 2, pp. 223--236.
Poole, David L., R. G. Goebel and R. Aleliunas (1987).
Theorist: A Logical Reasoning System for Defaults and Diagnosis, In: N. Cercone
and G. McCalla (Eds.), {\it The Knowledge Frontier:
Essays in the Representation of Knowledge}, Springer, pp. 331--352.
Przymusinski, Teodor (1986).
Query-Answering in Circumscriptive and Closed-World Theories. In {Proc. AAAI-86,
1}, pp. 186--190.
Przymusinski, Teodor (1988).
On the Declarative Semantics of Deductive Databases and Logic Programs.
In: J. Minker (Ed.), {\it Foundations of Deductive Databases and Logic
Programming}, Morgan Kaufmann, pp. 177--192.
Quine, W. V. (1987). {\it Quiddities}, Harvard University Press.
Rabinov, Arkady (1988).
A Generalization of Collapsible Cases of Circumscription.
{\it Artificial Intelligence} (to appear).
Rabinov, Arkady (1988a).
{\it On Ramifications, Qualifications and Domain Constraints in Formal
Theories of Action}, submitted for publication.
Reiter, Raymond A. (1980). A Logic for Default Reasoning. {\it Artificial
Intelligence 13\/}, Numbers 1,2, pp. 81--132.
Reiter, Raymond (1980a). Equality and domain closure in first order
data bases, {\it J. ACM, 27}, Apr. 1980, pp. 235--249.
Reiter, Raymond (1982). Circumscription Implies Predicate Completion
(Sometimes), {\it Proceedings of the National Conference on Artificial
Intelligence, AAAI-82}, William Kaufman, Inc.
Reiter, Raymond A. (1987). Theory of Diagnosis from First Principles.
{\it Artificial Intelligence 32\/}, pp. 57--95.
Robinson, J. Allen (1965). A Machine-oriented Logic Based
on the Resolution Principle, {\it JACM 12}, Number 1, pp. 23--41.
Searle, John R. (1980).
Minds, Brains and Programs, {\it The Behavioral and Brain Sciences, 3}, Cambridge
University Press.
Shoham, Yoav (1986).
Chronological ignorance: Time, nonmonotonicity, necessity and causal theories,
{\it Proc. AAAI-86, 1}, pp. 389--393.
Sterling, Leon and Ehud Shapiro (1986). {\it The Art of Prolog}, MIT Press.
Sussman, Gerald J., Terry Winograd, and
Eugene Charniak (1971). {\it Micro-planner Reference Manual,} Report AIM-203A,
Artificial Intelligence Laboratory, Massachusetts Institute of Technology,
Cambridge.
Touretzky, David S., John F. Horty and Richmond H. Thomason (1977).
A Clash of Intuitions: The Current State of Nonmonotonnic Multiple Inheritance
Systems. In: {\it Proc. IJCAI-87, 1\/}, pp. 476--482.
\bigbreak
}
\vfill\eject\end
\smallskip\noindent{This draft of nsf[w88,jmc]\ TEXed on \jmcdate\ at \theTime}
\centerline{\bf Budget}
\bigskip
\noindent
PERIOD COVERED: 3 years: 1 January 1989 through 31 December 1991.
\smallskip
\+Dates: &&&&&&1/1/89--12/31/89 &&&&1/1/90--12/31/90 &&&&1/1/91--12/31/91\cr
\smallskip
\+ SALARIES AND WAGES &&&&&&&$\qquad$Person- &&&&$\qquad$Person- &&&&$\qquad$Person-\cr
\+ &&&&&&&$\qquad$months &&&&$\qquad$months &&&&$\qquad$month\cr
\smallskip
\item{1.} Senior Personnel:
\smallskip
\+&a. John McCarthy, &&&&&00,000&&9.0 &&00,000&&9.0 &&00,000&&9.0 \cr
\+&$\quad$ Professor \cr
\+&$\quad$ Summer 75\% \cr
\+&$\quad$ Acad. Yr. 0\% \cr
\smallskip
\item{2.} Other Personnel:
\smallskip
\+&a. Research Associates \cr
\+&$\quad$ (1) V. Lifschitz,&&&&& && &&00,000&&12.0 &&00,000&&12.0 \cr
\+&$\qquad$100\% -- 12 months \cr
\smallskip
\+&$\quad$ (2) A. Rabinov,&&&&&00,000&&12.0 &&00,000&&12.0 &&00,000&&12.0 \cr
\+&$\qquad$100\% -- 12 months \cr
\end
\centerline{\bf Budget}
\bigskip
\noindent
PERIOD COVERED: 3 years: 1 January 1989 through 31 December 1991.
\smallskip
\+Dates: &&&&&&1/1/89--12/31/89 &&&&1/1/90--12/31/90 &&&&1/1/91--12/31/91\cr
\smallskip
\+ SALARIES AND WAGES &&&&&&&$\qquad$Person- &&&&$\qquad$Person- &&&&$\qquad$Person-\cr
\+ &&&&&&&$\qquad$months &&&&$\qquad$months &&&&$\qquad$month\cr
\smallskip
\item{1.} Senior Personnel:
\smallskip
\+&a. John McCarthy, &&&&&00,000&&9.0 &&00,000&&9.0 &&00,000&&9.0 \cr
\+&$\quad$ Professor \cr
\+&$\quad$ Summer 75\% \cr
\+&$\quad$ Acad. Yr. 0\% \cr
\smallskip
\item{2.} Other Personnel:
\smallskip
\+&a. Research Associates \cr
\+&$\quad$ (1) V. Lifschitz,&&&&& && &&00,000&&12.0 &&00,000&&12.0 \cr
\+&$\qquad$100\% -- 12 months \cr
\smallskip
\+&$\quad$ (2) A. Rabinov,&&&&&00,000&&12.0 &&00,000&&12.0 &&00,000&&12.0 \cr
\+&$\qquad$100\% -- 12 months \cr
\end
\subsection{\bf Biography of Arkady Rabinov}
\smallskip
\beginlsverbatim
BORN:
April 20, 1947 in Leningrad, USSR
EDUCATION:
Ph.D. in Computer Science from Leningrad Institute of Electrical Communications,
1973 (Degree was not formally awarded as a result of my application for emigration.)
Degree in Electrical Engineering (B.S., M.S. equivalent) from Leningrad Institute
of Electrical Communications, 1970
EXPERIENCE:
Research Associate, Stanford University, 1986--now
President, Command Software, Walnut Creek, California, 1982--1986
Senior Consultant, Sorcim Corporation, Santa Clara, California, 1981--1982
Consultant, Control Data Corporation, Sunnyvale, California, 1979--1981
Software Design Programmer, Mohawk Data Science, Los Gatos, California, 1978--1979
OTHER PROFESSIONAL ACTIVITIES:
Refereed papers for Artificial Intelligence Journal, FGCS-88 (International
Conference on Fifth Generation Computer Systems 1988)
\endlsverbatim
\bigbreak
{
\parindent=20pt
\noindent{PUBLICATIONS:}
\medskip
\item{1.} ``An Analogue Model of Parametric Filters'', {\it Proceedings of 3rd
Conference of Leningrad Institute of Electrical Communications},
1969 (with E. Finkelstein).
\item{2.} ``On Minimax Response of Electrical Filters'', {\it Proceedings of 4th
Conference of Leningrad Institute of Electrical Communications},
1970 (with T. Guitor).
\item{3.} ``Notes on Parametric Circutries Synthesis'', {\it Transactions of
Leningrad Institute of Electrical Communications}, Vol. 53,
1971 (with E. Finkelstein).
\item{4.} ``Worst-case error of a Prediction Algorithm for Second Order
Delta-Modulation'', {\it Proceedings of 5th conference of Leningrad
Institute of Electrical Communications}, 1971.
\item{5.} ``Improved algorithm of Second Order Delta-Modulation'', {\it
Proceedings
of 6th conference of Leningrad Institute of Electrical
Communications}, 1972.
\item{6.} ``Restriction of Factoring in Binary Resolution'', {\it
Proceedings of 9th
International Conference on Automated Deduction}, Chicago, 1988.
\item{7.} ``Programming in QLISP - Case Study'', Stanford Technical Report, 1988.
\item{8.} ``A Generalization of Collapsible Cases of Circumscription'', accepted
for publication in {\it Artificial Intelligence}.
\item{9.} ``Miracles in Formal Theories of Action'', accepted
for publication in {\it Artificial Intelligence} (with V. Lifschitz).
\item{10.} ``On Ramifications, Qualifications and Domain Constraints in Formal
Theories of Action'', submitted for publication in {\it Artificial
Intelligence}.
\item{11.} ``First Order Theories of Quantification'', in preparation.
\item{12.} ``Learning as Circumscriptive Theories'', in preparation.
}
\bigbreak
\nopagenumbers
\noindent{\bf Results of Previous Work Supported by NSF}
\medskip
The main theme of McCarthy's work on NSF Grant DCR 84-14393
{\it Basic Research in Artificial Intelligence} is the
development of the concept of circumscription and of its applications to
formalizing the nonmonotonic aspects of commonsense knowledge and
reasoning.
The key technical advance in the theory of circumscription is the introduction
of {\it formula circumscription}, which minimizes the extent of some particular
predicate while the extents of some other predicates are {\it varied}. This
generalization is crucial for most applications of circumscription. A
further extension, {\it prioritized} circumscription, is needed for more
advanced applications. Another idea that was found useful by many
researchers is expressing defaults by means of {\it abnormality predicates}.
Applications of circumscription studied by McCarthy include: the semantics of
inheritance hierarchies with exceptions; the unique names hypothesis; reasoning
about action and change. Some of these uses of circumscription were formally
justified in an interactive theorem prover.
This work is represented by the following papers:
{\bf McCarthy, John (1986)}:
``Applications of Circumscription to Formalizing Common Sense Knowledge''
{\it Artificial Intelligence}, April 1986
% circum.tex[f83,jmc]
{\bf McCarthy, John (1987)}:
``Review of {\it Artificial Intelligence --- The Very Idea} by John
Haugeland'' to appear in {\it SIAM News}.
% haugel[f86,jmc]
{\bf McCarthy, John (1987)}:
``Generality in Artificial Intelligence'', {\it Communications of the ACM}.
Vol. 30, No. 12, pp. 1030-1035
% genera[w86,jmc]
{\bf McCarthy, John (1987)}:
``Mathematical Logic in Artificial Intelligence'', in
{\it Daedalus}, vol. 117, No. 1, American Academy of Arts and Sciences,
Winter 1988.
% logic.2[w87,jmc]
{\bf McCarthy, John (1988)}:
``Artificial Intelligence'', for {\it Collier's Encyclopedia}.
% artifi.3[w88,jmc]
Of these the first and third are the most substantial scientifically,
the others being more expository.
In 1988 McCarthy received the Kyoto Award of the Inamori
Foundation for his work in artificial intelligence, especially in
formalizing common sense knowledge and reasoning, the area in
which he has received NSF support. The award included a gold
medal and a cash prize of about \$360,000.
Further work described in this proposal will develop nonmonotonic inference as
discussed in {\it Applications of Circumscription to Formalizing Common Sense
Knowledge} and the formalization of contexts as indicated in {\it Generality
in Artificial Intelligence}.
\end